70 





Vol. XXXVI Parts I and II ; June 1949 


BIOMETRIKA 


FOUNDED BY 


W. F. R. WELDON, FRANCIS GALTON anp KARL PEARSON 


MANAGING EDITOR 


EK. S. PEARSON 


ASSOCIATE EDITORS 


M. G. KENDALL JOHN WISHART 
in consultation with 
HARALD CRAMER MAJOR GREENWOOD 
R. C. GEARY J. B.S. HALDANE 


Reprinted by offset-litho, 1960 


ISSUED BY 
THE BiIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON 


PRINTED AT THE UNIVERSITY PRESS, CAMBRIDGE 


[Issued 22 July 1949] 






















VoLumE XXXVI, Parts I anp II 





THE INFECTIOUSNESS OF MEASLES 


By MAJOR GREENWOOD 


Measles, once a deadly, still a very common disease, is a worthy object of study for the 
biometrician. Although it is not now a very important cause of death or invalidity, it has been 
and may again become important. Then it is a virus disease, and virus diseases, for instance, 
influenza and poliomyelitis, are very important killing or maiming diseases; if we really knew 
precisely how measles spreads, that knowledge might help us to understand how these more 
serious illnesses are passed on. The statistical literature of measles is enormous, but, as 
I shall illustrate, it is easy to misinterpret unhomogeneous data, and really precise observa- 
tions of the natural history of measles are rare. 

Some opinions on the aetiology of measles are old and universally held by physicians. The 
first is that the length of time during which an infected person can pass the virus to another 
person is short and is over when the patient shows the characteristic signs of the 
disease—the typical rash, etc. The second is that the interval between the moment of in- 
fection, viz. reception of an effective dose of virus and the appearance of symptoms or signs of 
illness, is about 14 days; ‘From 7 to 18 days; oftenest 14’ is a common statement. The third 
is that droplet infection by coughing, spitting or contamination from sputum, etc., is 
responsible for the immense majority of infections. I see no reason to doubt that these 
statements are, broadly speaking, true; they are, however, vague. According to N.E.D., 
incubation is ‘The process or phase through which the germs of disease pass between con- 
tagion or inoculation and the development of first symptoms’. Symptoms, of course, are 
subjective, but probably the lexicographer includes physical signs—e.g. running from eyes or 
nose, rash, etc. This interval can only be precisely determined when the child has had but one 
exposure to another infected child. 

Such precise information is not abundant in the enormous ‘literature’ of measles because it 
implies exact knowledge of contacts, which is only to be had in country districts where the 
medical practitioner is fully acquainted with the social habits of his- patients. Excellent 
examples are to be found in Dr W. N. Pickles’s book, Epidemiology in Country Practice 
(1939, see pp. 32-6). On the other hand, abundant data of intervals between successive cases 
in families are available. In the second column of Table 1 are the figures provided by Stocks 
& Karn (1928); these are obtained from the record cards of all cases of measles notified in the 
Metropolitan Borough of St Pancras from March 1924 to March 1927. The authors say 
explicitly that the interval is that between the first appearance of the rash in successive 
cases. In this frequency distribution all the data are included. A family with two infected 
children could only provide a single item, but a family with three would provide two and so 
on. A medical reader would hold that none of the first four or five frequencies included 
children zeally infected within the family, but that they, like the first child in family to go 
down, caught the disease elsewhere. Then, noting that from the sixth day onwards the 
frequencies increase to a maximum and decrease, he would say that, from the sixth day 
onwards, the proportion really infected within the family increased. Of course a biometrician 

wishes to do better than this; he would like to dissect the compound frequency into two 
components. 
Biometrika 36 t 


UNIVERSITY UF UTAH LIBRARIES 








2 The infectiousness of measles 


The easiest of all dissections is into two Poisson frequencies (see W. Schilling, 1947). I tried 
this on the St Pancras data but the result was execrable; the summed frequencies from 
interval 9 onwards were quite good fits, but from 1 to 8 hopeless—the computed frequencies 
were 310, 143, 56, 47, 80, 136, 204, 266. A better result was reached by pure empiricism. I 
used all the frequencies from interval 7 as they stood, guessed values—4, 10, 22, 43, 74, 116— 
for the first six and then fitted a Pearsonian type III to the product of my cookery. It was 
not too bad, but of no scientific value. My algebra is not equal to fitting a type III curve 
scientifically from its tail. But even when that problem has been solved, it does not seem 


Table 1. Intervals (days) between successive cases of measles in families 
(intervals of 0 or of more than 20 days included) 











Providence, R.I 
Interval St Pancras % — % pe — > % 
families of three) 
1 341 9-88 253 6-07 61 8-54 
2 246 7-13 184 4-42 43 6-02 
3 160 4-64 108 2-59 21 2-94 
4 117 3°39 110 2-64 25 3-50 
5 96 2-78 lll 2-66 20 2-80 
6 90 2-61 114 2-74 20 2-80 
7 99 2-87 189 4-54 40 5-60 
8 173 5-01 247 5-92 41 5-74 
9 206 5-97 388 9-31 75 10-50 
10 318 9-22 519 12-46 110 15-41 
ll 353 10-23 511 12-27 72 10-08 
12 329 9-54 451 10-83 66 9-24 
13 269 7-80 344 8-26 53 7-42 
14 205 5-94 244 5-86 39 5-46 
15 153 4-43 142 3-41 14 1-96 
16 104 3-01 105 2-52 8 1-12 
17 70 2-03 58 1-39 2 0-28 
18 48 1-39 37 0-89 0 0-00 
19 38 1-10 25 0-60 2 0-28 
20 35 1-01 26 0-62 2 0-28 
Totals 3450 4166 714 





























that we should have a unique frequency distribution of intervals. The third column of the 
table gives the findings of Wilson, Bennett, Allen & Worcester (1938) for Providence, Rhode 
Island, 1929-34, and the fifth column their counts of intervals between first and second cases 
in families in which three children were affected. Wilson et al. do not explicitly say how they 
measured interval—they may, perhaps, have taken the interval between first signs of illness 
instead of between dates of rash—but I do not think the point material. It is obvious that the 
total Providence frequency is not congruent with the St Pancras frequency, and that the 
Providence selection is not congruent with the total. I am speaking from the biometric 
standpoint, i.e. that no two of them could be regarded (using the x* test) as drawn from a 
common population (naturally in comparing the Providence sets one has to subtract the 
frequencies of the selection from those of the total). But the unstatistical epidemiologist 
would say, with respect to the totals, that they are very similar; it is true that the St Pancras 





sao 








Masor GREENWOOD 3 


mode is a day later than the Providence mode and much taller, but at least we can say that 
the most frequent interval is 10 or 11 days and the decline on the short side faster than on the 
long side. This is arithmetically vague, but good enough to justify us in thinking that in very 
few cases following a ‘ primary’ within 5or 6 days the patients were infected by the ‘ primary’, 
and a great majority of those which were more than 8 days later were so infected. That 
information is enough to show how easily one may draw false conclusions from incomplete 
information, as I shall illustrate on a blunder of my own. 

Eighteen years ago I interested myself in the question whether any light could be thrown 
on the mechanics of infection by studying arithmetically the frequency distribution of 
multiple cases of disease in families. There should be, for instance, a contrast between 
multiple cases of enteric fever, which is certainly not conveyed by coughing and spitting, 
and multiple cases of measles, which almost certainly is. Unfortunately, data of multiple 
cases of enteric in families are hard to come by; but those of measles are common. I knew 
Dr Stocks was working on St Pancras records of measles and asked him to let me have some 
data. He kindly supplied me with four sets in which there were respectively 2, 3, 4 and 5 
children under 10, in addition to the child first infected, and three shorter series in which the 
children exposed were known not to have had measles before. I cite only series in which 60 
families were available. It was quite obvious that these data could not be fitted by ‘straight’ 
binomials for which the exponents were the numbers in family (other than the first child to go 
sick), and p was given by the ratio of total cases to total of exposed children. Then the idea 
of a chain suggested itself. Suppose the first child distributes infection binomially; he may 
infect 0 child or he may infect all the n exposed, with chances g” and p” (p being unknown); 
in these instances no chain arises. But every other term of his binomial provides an oppor- 
tunity; if he has infected »—1 (his chance of doing so being np”~q); this leaves one child 
unaffected, who is exposed to infection from one of these (n — 1) secondaries who distributes 
binomially with chance p’, and so one could have a family in which all n were infected. Now 
if each new binomial distributor does so with a different p, we shall have arithmetical diffi- 
culties. Biologically it seemed very unlikely that the chaining p’s would all be equal. Dr 
Stocks had in fact made it probable that exposure to infection whic produces no clinical 
signs or symptoms of disease may confer some immunity. Hence the (n — 1)th child in series 
is not so likely to infect the mth as the first to infect the second. However, it is sound empiri- 
cism to begin with the simplest hypothesis and to see if it works. Now if p is constant, the 
algebra and arithmetic are simple enough. Take n = 2. Then qg? should give the proportion of 
families with no cases other than the primary, 2q?(1—q) that of one case, and 1—-3q? + 2q° that 
of two cases. The mean will be equal to 2—4g?+ 29°, so we can solve for g. If 2 were 3 the 
highest power of q in the equation would be 6, if n were 4 it would be 10, and so on.* The 
arithmetic would become tedious as n increased, but in real life, in these days of small families, 
it would be rare to have adequate data beyond families of 3! 

It is important to note that the assumption is made that the distribution of children in- 
fected by the first child, the primary, must be strictly binomial; as we shall soon see, this 
assumption may not be justified. 

I applied this process to Dr Stocks’s data, and in every instance the fit wa’ satisfactory. 
For n = 2, in the less select and longer series (358 families) Pt was 0-75; in the smaller but 


* See Greenwood (1931). 


+ Here and below P relates to the x test for goodness of fit, standing for the chance of a result which 
is as least as divergent as that observed. 


1-2 








4 The infectiousness of measles 


more select (299 families), P = 0-58. For n = 3, P was 0-36 for the longer and 0-42 for the 
shorter series; for n = 4, 0-61 and for n = 5 (only 61 families) 0-42. I must confess that the 
concordance pleased me and, no doubt, lulled a scepticism which, I think, is my usual habit 
of mind. Really to verify the theory, one should break up the groups into their constituents, 
thus in the simplest case, that of n = 2, the proportion due to the primary is (1 —q)* and the 
proportion derived by chaining 2(1—q)*g. Well, was it? I do not know; it is possible that 
Dr Stocks could not have supplied the information and certain, in view of what I have 
already said of the difficulty of dissecting the frequency distributions of intervals, that the 
information could not be absolutely precise. But that is no excuse at all for not even asking! 

In the excellent memoir of Wilson and his colleagues already quoted, it is conclusively 
demonstrated that, to their data, this simple hypothesis does not apply. Like me, they found 
usually excellent fits to the gross data; for 416 families with n = 2, the value of P was 0-94, 
for another set of 185 families, 0-68, and for 151 families with three susceptibles P = 0-48. 

But, when the clubbed frequencies were dissected, the chain theory was demolished; in 
every instance the proportion infected by the primary was far larger and the proportion 
derived from chains far smaller than it should have been. One example (Wilson et al. 1938, 
p. 449) will suffice. 

The gross data consisted of 151 families with three susceptibles (other than the primary) 
giving 10 (9-4) 0’s, 7 (6-5) 1’s, 39 (33-1) 2’s and 95 (101-0) 3’s. The figures in brackets are the 
expected numbers using the chain method with g = 0-392, and show an excellent fit, P = 0-48. 
Now let us break up the 3’s into their chain constituents. We have 


6N p*g* = 12-7 (4), 3Np%q? = 15-6 (3), 3Np%g = 39-8 (13), Np? = 33-9 (75). 


Perhaps it may be said that the allocation to the several groups depends on personal opinion; 
but in this instance—and, I have no doubt, in the others—different classifiers would not 
reach significantly different results. Dr Jane Worcester kindly sent me a copy of the working 
sheet allocating the 95 sets of three cases. Of the 75 attributed to the Np* group, the greatest 
interval between the first and third in series was 5 days (two instances) and there were eight 
instances of an interval of 4 days. On any reasonable hypothesis of short incubation most of 
these must be sitributed to the primary, and even if all were classed as intrafamiliar, the fit 
to the chain hypothesis would not be materially improved. There is a still more conclusive 
argument. The simple chain hypothesis assumes that the distribution of cases due to the 
primary is a binomial distribution; in the Providence experience it is nt. 

This subject has been taken up in a recent paper by Prof. E. B. Wilson (1947). In a set of 
519 families with two susceptibles other than the primary, there were 49 0’s, 102 1’s and 368 
2’s. Take the ratio of total cases to population, 0-807, as p and n = 2. The binomial distribution 
is 19, 162 and 338. In this particular set, the fit of the gross totals to the chain hypothesis was 
poor, P = 0-02, but the binomial fails just as conspicuously when applied to the primary 
distribution for 416 families which gave P = 0-94 for the chaining method when used on 
massed frequencies. Here the gross totals were 51, 67 and 298. Of the 298 sets of two cases, 
36 were found to be due not to the primary, so that the distribution due to the primary is: 
0, 51; 1, 103; 2, 262. The binomial distribution for p = 0-7536 is 25-2, 154-5 and 236-3, which 
is quite hopeless. This at once suggested, what I should have seen before posing the hypo- 
thesis, that if the families are heterogeneous in respect of risk, the summed frequencies 
obtained by adding 0’s, 1’s, 2’s, ete., and deducing a p from the massed data could not be 
a straight binomial. This was first pointed out, I think by Karl Pearson (1917). It is easy to 








see 


— © oO 


—~— pte 202 ze cn) oot ae ae 





Mason GREENWOOD 5 


see that, if we add in this way, the variance obtained wili not be npq, where p and q are the 
weighted means of the several values for the summed binomials but will exceed npg by 


wn. 1 Se (=2)"\ ; (1) 


where n is the exponent and m, and p, the number of observations and value of p for the sth 
binomial; N = Sm,. Clearly this only vanishes if the variance of p vanishes. 

Is this, however, anything more than a debating point? I think it is, for this reason. It is 
hard to doubt that measles is conveyed by droplet infection and, if it is, the contiguity of the 
susceptibles to the source of infection must be a factor of the attack rate. So far as I know, 
there are no published data giving attack rates on groups of n susceptibles, primary attack 
rates, tabulated by numbers of rooms occupied. That mortality rates are negatively correlated 
with social status is, of course, a common-place of epidemiological literature and the usual 
explanation is that in poor families children are exposed to infection at an earlier age than in 
the families of the well-to-do; the fatality rate of measles diminishes steeply with advancing 
age. No doubt it is implicit in this argument that infection is increased by overcrowding 
but without data we certainly cannot say that in economically homogeneous data, the 
distribution of primary infections is binomial. Without fresh data, we can do no more than 
test whether the, by hypothesis, heterogeneous aggregate can be subdivided into constituent 
binomials. At first this seems trivial; if there are s subgroups and s values of p and the only 
conditions imposed are that the mean value of p and the variance of p must be reproduced 
and, of course, that all the p’s must be positive and less than unity, the number of solutions 
must be great. However, there are two other ‘common sense’ restrictions. The subfrequencies 
must be reasonable; the frequency distribution of persons per room must be unimodal and 
the y’s must decrease as the number of persons per room decreases. Take Wilson’s set of 519 
families of two susceptibles. The variance of the aggregate (49 0’s, 102 1’s, 368 2’s) is 0-42569 
which exceeds the binomial variance—2 x 0-807 x 0-193 by 0-11419 so if this aggregate is a 
sum of binomial frequencies the variance of p is 0-05709. I split up the total of 519 families 
into six subfrequencies, 47, 189, 147, 81, 25, 30. This is obtained from the proportional 
frequencies of families of three living in 1, 2, 3, 4, 5 and 6 or more rooms in the Metropolitan 
Borough of St Pancras in 1931. I then adjusted the p’s on this principle : That the first should 
be unity, i.e. that in the very crowded tenements both children should be infected and that 
in the most spacious tenements neither child should be infected and that the p’s should 
decrease from | to 0. Actually if the intervening p’s are 0-9, 0-87, 0-795 and 0-389, the variance 
of p is 0-057 and the aggregate subtotals closely reproduced (47, 106, 366). Of course, many 
other solutions are practicable; the only value of the trial is that there is nothing plainly 
preposterous in the run of the p’s. 

The obvious criticism from the practical point of view is that to reach homogeneity re- 
quires an enormous mass of data. Suppose what I am suggesting were done (in the records 
of public health departments in this country and in the United States an immense amount of 
information lies unpublished) there would still remain possibly relevant heterogeneity. Still, 
the problem is an interesting one; that the primary distribution should be binomial, or 
approximately binomial, is a seductive hypothesis. 

But it may fairly be objected that we have the massed data and it is interesting to try to 
interpret them. Wilson has made two suggestions. We infer from the data that measles does 
not spread within the family as if the children were independently infected and there are at 














6 The infectiousness of measles 


least two ways of characterising the dependence. The first may be described in his own words 
(the illustration is the set of 519 just discussed). 


One way to express dependence of elements is to compute the number which would be required by the 
theory of chance to explain the observed standard deviation. The actual secondary attack rates in the 
families with 0, 1, 2 secondary cases are 0, 0-5, 1-0 respectively; their mean is 0-807 and their standard 
deviation squared (variance) is 0-106. If this be equated to pq/n we find n = 1-46. Thus the two suscept- 
ibles in the family are behaving relative to the contracting or escaping the infection as though they were 


about one and one-half. 

I have substituted 0-106 for 0-016 which is an evident misprint. Here p is the mean value of p. 
The plan is to substitute for the observed table a binomial the exponent of which is deduced 
from the variance of p. I have a certain awe of binomials the exponents of which are not 
integers, perhaps owing to a recollection of v. Bortkiewicz’s furious polemic entitled 
Realismus und Formalismus in der mathematischen Statistik (1918) directed against a paper 
by L. Whitaker (1914). The last-named author had found better fits to some data used by 


v. Bortkiewicz to illustrate the Poisson series by using fractional or negative exponents to 
binomials. 


Taking Wilson’s example, we have: 


(0-8073 + 0-1927)!46227 — 0-73124 (1 + 0-2387)!46227, 
and the successive terms are 


0-73124, 0-25523, 0-01408, — 0-000603, 000056, etc. 
Now if we use these as frequencies for variates 1-462, 0-462, — 0-538, — 1-538, — 2-538, etc., 
and compute the mean and variance, we reproduce the proper mean and variance but do not 
reproduce the frequencies. Ignoring terms after the third—the sum of the first three terms is 
1-00055—we find for expected frequencies, 379-5, 132-5 and 7-3, not much better than the 
integral binomial. 
Wilson gives a set of 100 families with three susceptibles, as follows: 

No. attacked 3 2 1 0 

Frequency 67 18 ll 4 
This time the mean p is 0-826667 and its variance 0-07884, so that n = 1-81745, giving for the 
binomial 0-70754 (1 + 0-20967)!*1745, The sum of the first five terms is 1-00016; neglecting the 
rest, the mean and variance are correctly obtained to 3 places for the mean and 2 for variance, 
but the frequencies are poorly reproduced, namely 70-7 for 67, 26-9 for 18, 2:3 for 11 and a 
small negative value for 4. 


A better method, as Wilson suggests, is to use the method of association. When there are 
two susceptibles, call them A and B and form the table 


+ _ 





+| (AB) | (aB) 





—| (AB) | (aA) 














In this (A B) means the frequency of both susceptibles going down, («B) the frequency of A 
not falling sick but B doing so, etc. In such a table as that containing the set of 519 already 
discussed one cannot distinguish A from B (which, as Wilson notes, might be done if one 
used, for instance, age of each member of a pair as a distinction) so A must be put equal 
to B. One can then proceed on Yule’s lines for associated attributes. In the case before us 





rt 
is 


~ 


J 





Mason GREENWOOD 7 


the secondary attack rate is 0-807, but if one of the pair is attacked, the chance the other will 
be attacked is 0-88, while if one of the pair escapes the chance the other will escape is 0-49. 

It occurred to me that another way of bringing out Wilson’s point would be to use the 
theorem that if the events are not independent, o? = npq(1+1r(n—1)), where r is the arith- 
metic mean of the $n(n— 1)-correlations of the variables, viz. the product-moment correla- 
tions of variables restricted to the values 0 or 1. 

Suppose we have to do with n correlated ‘events’ of this type, we are concerned with a 

succession of 0’s and 1’s and for each of the n items the mean is p and the variance pg. Hence, 
if we estimate the expectation of the tth event from the results of the preceding ¢— 1 events, 
each regression coefficient will be a coefficient of partial correlation (for all variances are 
equal) and we have Xy—P = Prras...(%1— P) + Patas...(%2—P) + ---- 
Now if we put in this equation the values of x,, 22, etc. (the values must all be 1’s or 0’s), we 
reach, let us say, the value k. This will be its expectation. But, as x, must be either 0 or 1, this 
amounts to saying that the probability that x, will be 1 is k and the probability it will be 0 is 
1 —k. The equation is, however, useless unless we know the values of the partial correlations. 
Let us suppose they are all equal. In the particular case of n = 3—the only one I shall dis- 
cuss—they would be each r/(1 +71). The first event 0 or 1 has expectation p or q. The regres- 
sion equation of the next event 2, on 2, is %, = rz, + p(1—r) giving for z, = 1, r+ p(1—r), for 
x, = 0, p(l—r), so the probability of (11) is p?+ pqr, of (10) pq(1—1), of (00) g?+ rpg and of 
(01) pq(l—r). 

If x, and x, are given, the regression of x, on them is 

%y = x, 7/(L+r) +2Qr/(r +1) + p{l —2r/(r + 1)}, 
and the 8 probabilities can be calculated. For instance, the probability of the succession (010) 
will be pq(1—r) {1 —r/(1 +r) —p(1 — 2r/(1+1)} 


= 2 a Se 
= 757 l- at Pr} = FG {le- Pr) a+ Pr}. 


The expectations of (010) (001) and (100) are, of course, the same, so the probability of 1 
‘success’ is 3pq 
ia —1r)(q+pr)}. 

In this way, we reach for the complete ‘successes’ distribution 


= 
0 747 Ot?) G+tr), 


3pq 
1 Tor 4tP) (ln), 


3pq 
2 Top Ptr9) (1—r), 


3 P(p+ar) 
l+r 
Leading to an area of unity, a mean of 3p and a variance of 3pq(1 + 2r). One can proceed in 
this way step by step to deduce the 16 frequencies for the case n = 4 and so on. 
Dr J. O. Irwin has, however, obtained an elegant general solution which he will, I hope, 
publish, for it may be of value in other investigations. In these days of small families, one 
will not often have sets of more than 3 susceptible children apart from the first infected in 


(p+qr+r). 








8 The infectiousness of measles 


measles and I confine myself to trying out the method on Wilson’s 100 sets of 3 susceptibles 
beyond the first. Here p = 0-82667, and r = 0-32538 giving: 








Cases Observed Expected 
0 5 4-44 
1 11 9-68 
2 18 19-32 
3 67 66-56 

















x? = 0-7091 and P (for one degree of freedom) = 0-40. 

One has a temptation to speculate on the results of not assuming that the correlations are 
equal, but it would be pure speculation and, although the method interests me, I am not sure 
that it is better than thestraightforward calculations Wilson proposes, although it does permit 
of deducing a x? which, for old acquaintance sake, is pleasing to me. As I have indicated 
earlier in this paper, one needs more precise and homogeneous data than the aggregated 
results of a city survey furnish. The comparison of two aggregates is subject to another 
difficulty to which Wilson and his associates draw special attention (Wilson et al. 1938, 
pp. 443-4). Neither the St Pancras data of Stocks upon which I worked, nor those of 
Providence, R.I., are complete records of the occurrence of measles in the areas. Stocks 
estimated that as many as 70 % of the cases occurring were reported; Wilson and his col- 
leagues put the proportion in Providence, R.I., no higher than 50 %. The secondary attack 
rates in Providence were a good deal higher than in St Pancras. Prima facie measles was 
much less infectious under the conditions of family life in St Pancras than under those of 
Providence, R.I., Wilson et al. write: 

One need not overlook the possibility that with only 47 per cent reported in Providence, it may be that 

those families with a large number of secondary cases in proportion to their total susceptibles are dispro- 
portionately frequent in the reported as compared with the unreported families. The discrepancy, however, 
between the secondary attack rate in St Pancras and Providence is so great that the rates cannot well be 
reconciled on the hypothesis that there is a differential in favour of higher attack rates in the reported 
families in Providence unless one assumes a differential in favour of lower attack rates in the reported as 
compared with the unreported families in St Pancras. This leaves a quite enigmatic situation and makes 
any comparisons problematical (op. cit. p. 444). 
I have nothing to add to this clear statement. One might surmise that in a family in which 
many children come down together or within a short time, i.e. are infected from an extra 
familiar source or from the first in series, domestic help may be more urgently needed than 
when the disease is passed on by chaining. But I can see no reason why such a bias, #f it 
exists, should be more effective in St Pancras than in Providence, R.I. 


REFERENCES 


GREENWOOD, M. (1931). J. Hyg., Camb., 31, 336-51. 

Prgarson, K. (1917). Biometrika, 11, 139-44. 

Pickiges, W. N. (1939). Epidemiology in Country Practice. Bristol. 

Scuriitine, W. (1947). J. Amer. Statist. Ass. 42, 407-24. 

Stocks, P. & Karn, M. N. (1928). Ann. Hugen., Lond., 3, 361-98. 

von Bortxriewicz, L. (1918). Aligem. Statist. Ark. 9, 225. 

Wurraker, L. (1914). Biomeirika, 10, 36-71. 

Witson, E. B. (1947). Proc. Nat. Acad. Sci., Wash., 33, 68-72. 

Wrsoy, E. B., Bennett, C., ALLEN, A. & WorcEsTER, J. (1938). Proc. Amer. Philos. Soc. 80, 359-756. 








[9] 


A NOTE ON THE ANALYSIS OF GROUPED PROBIT DATA 
By K. D. TOCHER, B.Sc., National Physical Laboratory 


INTRODUCTION 


If the members of a group of objects are subjected to a common level of a certain stimulant 
and the number affected recorded, the proportion of these to the total is a measure of the 
effectiveness of the stimulant. If such data are available for several levels of the stimulant it 
is desirable to find a method of describing the effect in terms of the level of stimulant. 

Probit analysis achieves this by assuming that there is a critical level of stimulant asso- 
ciated with each object, being the least level giving an effect, all higher levels being assumed 
effective. The objects are supposed drawn at random from a population with a normal 
distribution of critical levels.* The relation between level and effect is completely specified by 
the mean and variance of this distribution. The statistical problem is reduced to estimating 
these parameters and assigning confidence limits on the level giving a fixed proportion of 
responses on randomly chosen suljects. 

Many other real experiments may be reduced to this pattern by a change of language. 
A common variant in psychological work is obtained by subjecting groups of children of 
a common age to an intelligence test. The level of stimulant is replaced by the common age of 
the group, and the effect becomes the proportion passing the test. 

Whereas, in the usual probit problem, the level of stimulant is under the experimenter’s 
control and can be fixed exactly equal for each member of a group, the children in the groups 
will not all be exactly the same age. If each child is regarded as a group of one the usual 
technique may be applied, but the actual calculations are tedious due to the unequal spacings 
of the ages and the large number of groups. For this method tc be applied the actual ages of 
the children must be known, and this is not always the case. 

Thus it is desirable, and, in some cases, necessary, to have a method of analysis of grouped 
data. This paper derives a technique of achieving this and shows that an approximation to 
the correct maximum likelihood estimates can be obtained by applying Sheppard’s correc- 
tions to the variance estimated by neglecting the grouping in a normal probit analysis. 
Examples of the technique are given. 


THE EXACT MAXIMUM LIKELIHOOD ESTIMATES 


Suppose there are m groups, the sth group containing n, members, with ages uniformly 
distributed in the range (z,, Z,), of which r, pass the test (s = 1, 2, ...,m). 
If the critical age x has a normal distribution with mean y and variance o* then 


¥ =¥,=-(e-p) =ar+b (say) (1) 
has standardized normal distribution. 


* In some cases some simpie transform of level, such as the logarithm, is assumed to have this normal 
distribution. 








10 A note on the analysis of grouped probit data 
The probability that a child of age w passes the test is 


1 f¥ 
P(Y,,) = =a) et dz, 
and hence that for a child in the sth group is 


1 Z, 
P,= 4, ee] ing 
where A, = %,—%, 
and 1 dz) = | * PUY.) du. 


(2) 


(3) 
(4) 
(5) 


A 
The usual iterative solution for the maximum likelihood estimates @, b uses the mean- 


value theorem identities 
oL aL L A 
da, _ Age (40-4) 5 aa ab 20 2)» 


aL BL ; 
ab, ~ da0b 7 a) + Fe bo ), 


where L is the likelihood of the result on the values a, b; 0L/0a,, 0L/db, are the partial deriva- 
tives of L with respect to a, b respectively at the values ay, b, and 02L/da®, 0°L/da ab, 02L/ab* 


are the second derivatives of L. 


These identities only hold exactly if the second-order derivatives are evaluated at the 
appropriate point within the square with diagonal (a), b,) (@, é). Since this point remains 
unknown approximate values of the second-order derivatives are used, and the process 
repeated on the resulting estimates until these no longer alter. The usual approximations 


used are the expected values of the second-order derivatives at (do, bg). 


Now L=C+ 4 {r,log P, + (n,—1,) log Q,}, 
where Q, = 1—P,, 


aL n, oPaP, 
El snag =-ZBG tq op (%F = 49). 


We require expressions for P,, 0P,/da, 0P,/db. Integration by parts gives 


oe) =| uf os a cde fal” wh entra 


1 fart+b l 
= ePY,)-2| (v-b) e-t* dy 





= 1, PW) + 2,)}, 


where Z(u) = ; +— e-t# 


(6) 


(7) 


(8) 


(9) 


(10) 











K. D. TocHEr 1l 
P, is obtained from (3) and (9). Also 





Ap(zx) 1 y PUY) mz 2 1 
= “[zP(Y)—$(2)] = “fom Y), (1) 
ap(ex) 
Ete) P( (y) 2 af =~ P(¥), (12) 


the suffix z on Y being understood. 
- 1 [/4erF 








Clearly = a,)), (13) 


and this with (11) and (12) gives 0P,/da, 0P,/ab. 

The quantities required for the first step in the iteration can now be calculated, and with 
the new values of a and 6 obtained the process repeated. A slight saving of labour results 
from using the initial values of the second derivatives in the successive steps rather than re- 
calculating them, without any material increase in the number of steps required. 


A NUMERICAL EXAMPLE 


In the particular but important case in which the gaps are of equal width, non-overlapping 
and completely covering a range of ages the calculations can be reduced to a reasonable 
tabular form shown for an example in Table 2. 

In this case the centres of the age groups are separated by equal distances which may be 
used as a unit of age. The upper end-point of one group is the lower end-point of the next, and 
P., with its derivatives, can be formed by differencing ¢(x) and its derivatives. 

(1) For each end-point the following functions are tabulated using the trial values of a, 6: 


(i) Y, (ii) P, (iii) Z, (iv) YP+Z, (v) bP+Z. 
(2) The differences of (iv) are aP, from which aQ is easily obtained (cols. (vi) and (vii), 
Table 2). 
r 


(3) Using the data, é = =P = is then tabulated for each group (cols. (viii) and (ix)). 
a 


(4) (ii) and (v) are differenced giving a = and —a = respectively (cols. (x) and (xi)). 
awe 


(5) The scalar product of these two columns with é give ~ and — an 

) is formed as col. (xii). 
(aP) (aQ) (aQ) , 
(7) The products of (xii) with (x) and (xi) are tabulated (cols. (xiii) and (xiv)). 
(8) The scalar products of (x) and (xi) with (xiii) and (xiv) give 

eL OeL ae eL 
~ ar +7 3a05 ONS —% Gat 

The mixed derivative is formed twice as a check. 

(9) Solving the system of linear equations in the usual way corrections to the trial (a, 6) 
are found, and the process repeated to stage (5) as often as necessary. 








12 A note on the analysis of growped probit data 


The usual process of reducing the calculation to a weighted regression line is not possible 
with grouped data, as P is not simply a function of Y but also depends explicitly on a. 
The calculation is now illustrated with an example. Table 1 gives the number of children 


in various groups who show certain secondary sex characteristics together with the total 
number in each class. 


Table 1. Resulis of examination of children for secondary sex characteristics 








classified by age 
* . No. with sex 
Age group No. in group characteristics in group 

9 48 0 
10 100 0 
ll 95 1 
12 100 5 
13 91 13 
14 94 47 
15 56 38 
16 33 29 

















* Children whose ages are x years + 6 months are in age group 2. 


Converting the proportions of sex-characterized children into normal equivalent deviates, 
and plotting against age we obtain as rough estimates 


a=0-70, 6=-—3-70 (using an origin of 9 years). 





The end-points are 84, 94, ..., 164 years. Table 2 contains the necessary calculations as out- 
lined above for the first step of the iteration arranged in tabular form. This gives 
92 2 
OL _ 15-7004, 24 — 062830, 72 soe, “2 _ 174-699, 
da ob da* ob? 
eL eLeL (aeL\? 
= — 814-832 pe ll 4 1 — 2-15301-5 
sa ob = ~ 814882, A=aaae (= =) 4-644764,t A-l = 2-15301-5, 


The corrections to subtract are: 


1/@LoL @L OL 
Pp bells cl 5 y 9.95 —— 
da = x Spa a dadb 5) 2-15301-5 x 2-23089° = — 0-048, 
] e oL @LOL a ‘ 
6b = A\da® Db ~ dadb al = 2-15301-5 x 1-02383* = + 0-220. 


This gives new values of a = 0-748, b = — 3-920. 
Repeating the first step but omitting columns (xii), (xiii), (xiv), which are only needed to 
determine the second derivatives, we obtain 0L/da = — 0-630275 and 2L/db = —0-299514. 


These give, using the old second derivatives, corrections da = — 0-003, 5b = + 0-015. The new 
start isa = 0-751, b = — 3-935. 


The successive corrections and approximate values are given in Table 4. 


} 4-64476* means 4-64476 x 10‘, and similarly elsewhere. 








eK 


on 
~~ 


K. D. Tocuer 






















































































_ e8te-e— _ - [8h8-E— - -~ SIte-E— zv—h=q 

~ EhEL-0 - - SFEL-0 _ _ O8EL-0 . 

- SIIF-0- ~~ _ 6E1F-0- - _ OLZ¥-0 4 

= | me | = | = | Se | = | = | Sah “s 

- oo - - 108-L88— - - 81Z-998—- ovurz |(Anurz) (zn) 

- PIL-ZbI— - - 069-8F1— -~ - 188-FI- — 

_- 9 _ ~ 109-692 - - 068-882 7 

=~ ToyaTee _ -_ 996-9188 _- - 69F-988E own /e(200wz) 

_ OLL-SLOP - _ LOF-180P - _ 998-PLIb ern 

_ 922°S18 = mn €Z1-918 - _ 1Z8-FE8 wz 
66LTL— S61-PLT ~ 822'2L— 66P- FLT - FRP-OL— 60T-6LT sTb90], 

Blt S6L-TT 62:1 + 286-21 + S6L-TT 6ct+ | slebit 022-21 061+ | 7 62 £e L 
po Z8L-1E 99-0+ 899-FI+ ZBL 9¢-0+ LL0-S1+ SES-ZE oso+ | 8 8g 9¢ 9 
9IL-O + TPT-6S 81-0- 9IT-0 + IP1-69 g1-0- SST-0 + LLO-8S oe0- | Lb LY *6 ¢ 
£00-Sb— 909-2P 16-0- §00-Sh— 809-2 16-0- ¥S6-SF— 106-2 06-0- 8L raat 16 7 
189-98 — OFS-23 99-1 — $10-LE- 689-23 19-1- FE0-68— eSL-€3 09-1 | $6 g 00T g 
€90-FI — 601-9 8E-3— $90-F1- 601-9 883 919-91- 98T-L oez- | 6 T 66 Zz 
goog — $101 Il-¢- goog — E10-T Il-¢e- Sigh — LOFT 00-¢- 00T 0 oor T 
€60-0 — 1*0-0 o8-E—- 00z-0 — 640-0 *8-E- gIg-0 — 080-0 OLe- | 0 SP 0 

| 
| | : - 
finu nu aD finu nu LD | finu | nu a | iu 4 | u z 
L i | 
pyop yiqoLd wf wounNyvyD * Bqe2I, 
; se 2. re. et. ae ee ae eee ee ee 
6OOFLI-=7 5 “BEBHIB—= 7G 1h-990F— =", ‘08829-0=7, ‘FOOLSI=7, ‘8066-01=7,° 
O6LESSE-E— | EFZIOLS-I | 06000ZT-0 | Z6ZPEEE-O | SST o-9I 
I16-F2b | 899-18 | LOL-SE9 TSTZS99-0— | LIGOLET-0 | #6E9-0—- 7 6Z | SLERE80-0 | SZST9T9-0 6£99060-3— 8116696-0 6F86L1Z-0 SL8£208-0 $-0 oS 
960-0FS | I1Z-6ZT | 9FE-ZES LLESKIO-I— | 86ILZbZ-0 | 898F-E— gt 88 | ZOSFSIZ-O | S6FSTSF-O | Z90I9L9-I— | OzEPSLF-0 C6LFFEE-O | LLI96S¢-0 SI-O+ | SFT 
GES-OFL | LE0-T1Z | ZE8-98L ZESLIFG-O— | OSShR9Z-0 | OFOL-ZE+ LP L¥ IFSEFOP-O 6S19962-0 | OLFEFELO- | 1908281-0 GEPEZFE-D | LOSTIOZO | SS-0- | SET 
STL-LE9 | IL8-b2Z | ST-ZIZT 81609ZS-0— | 660SS8T-0 | FSS0-6E— gL I | BOBLLOS-O | Z6TZZEI-O | Zesz902-0 698S0S0-0 T6P9E8T-0 86F9901 0 | S@I- | S3I 
G99-2F9 | TlL0-L6Z | ZS-OILE 6ELIELT-O— | L190080-0 | S1S8-1Z—- 66 g OTTT6S9-0 | 06880F0-0 | €180¢80-0— 61696000 LP6S690-0 | 188SSZ0-0 | S6-I—- | SIT 
6E8°1ZS | 2S-0SE FSEZIT | SZOIZEO-O— | SE9STZO-0 | 1969-LT— +6 T T6PS169-0 | 60SF800-0 8816200-0— OLFZ100-0 | GZIGITO-O | 9FZ000-0 | S9-e— $-01 
€29-L8E | SPP-SIP SPLFIT | €266Z00-0— | $0Z9E00-0 | 9060-EFI— | OOT 0 0898869-0 | OZFITOO-0 | $980000-0 0S01000-0 | 18SF100-0 | TFOFOO0-0 | Sé-¢ C6 
T9-S8 | 60¥-19Z 979069 | Z1S0000-0— | $818000-0 | ZI8c-89 — | &F 0 1006669-0 | €660000-0 | L¥10000-0 1900000-0 | #601000-0 | 96200000 | 90%— | $8 
| i 
D 2= | | | 
mig) | 7 o-= | v= Qe ae g- | | | 
u ~ "(z7+a0V | dv| w-« uw) ws) &» (Z+ par: Z+d9 | (2+d 4) Z d a z 
(atx) (11x) (mx) (1x) (x) (x1) (ma) (11a) | (14) | (A) (a1) (™) (1) () 
| | 
































nop yiqosd padno.sb sof uoynnav9 fo msof s0NqQD J, *Z 248], 

















14 A note on the analysis of grouped probit data 


Table 4. Steps in iteration 











Starting values Corrections (subtract) 
Step 

a b a b 
1 0-70 —3-70 — 0-048 + 0-220 
2 0-748 — 3-920 — 0-003 +0-015 
3 0-751 — 3-935 — 0-0003 + 0-0022 
4 0-7513 — 3-9372 — 0-00004 + 0-0004 
5 0-7513 — 3-9376 — 0-000006 0-00003 























The final maximum likelihood estimates are thus a = 07513 and b = — 3-9376 obtained in 


five steps. Thus the distribution of critical ages for showing the sex characteristic in question 


has a mean 9+ pall = 14-24 and a standard deviation 





1 
O73 '** 

The number of significant figures retained has been 6-7 throughout. It has been found 
from practical examples that this figuring is required if the derivatives are to be obtained to 
3-4 significant figures. The use of less accurate values would increase the number of steps in 


the iteration, and it seems more profitable to carry a few extra figures at each stage to avoid 
these additional steps. 


AN APPROXIMATE SOLUTION 


Each step in the iteration proposed and illustrated in the last section is a lengthy calculation 
and renders the whole process very tedious. If a simple method can be obtained for reaching 
an approximate solution of greater accuracy than the usual guess trial the number of steps of 
the form above can be reduced to 1 or 2. Since the data are formed by grouping probit data, 
corrections to the ungrouped answers would seem a promising approach. 


We consider the case of equal width groups used as unit scale of age, although these need 
not completely cover any range. 


Expanding the integrand below as a Taylor expansion we have 


a Y+ia Y+ta - 
P~={ ; Pwy)dy=| ; (P(x) +(y- ¥) 4x) -Y—2" yay) ay 
a) y-14 a y-ja\ - 


a2 
= P(¥)-S YAY). 








aP “ 
fot. 28 4 ot cate a. ae 
sy t2(){1 = (1-3 ). 
tye r n-—-?Tr 
) ee ee aYZ\ (| @YZ 
Se) O59) 


_{r n-r\ @ r n—r 
*(5-*g') +n pte): 


where Q = 1—P, the argument Y in P and Z is undérstood and a?/24 is assumed small. 





Hence 


where 
terms 
whicl 
tion 1 


ung 


an 





K. D. TocHEer 15 
Hence oh ek (5-"5")+ ")+$ oY 2( pat +") |4)- a | 
23(5—" 5 a +E lra( f+ *a)- (5-* % **) au r\o 


where a = aor b, and the summation is over a suffix s, dropped to shorten the notation. The 
terms in square brackets may, for a?/24 small, be regarded as corrections to the first term 
which is the value of 0L/0a obtained treating the data as ungrouped. We replace the correc- 
tion terms by their expected values obtaining 


OL . OL, OLy , a* 5 n@ =. 
da oa 24 PQ 
L, being the likelihood calculated on assumption of aoa data. 
Even cruder approximations serve for the second derivatives, as these approximations 
only affect the rate of convergence of the iterative system and not the final answer. We shall 
use the ungrouped values, viz. 


5B oYoY ” 
| soap |= - 25000 op (HF =%°)- 
R 
Jai i — = 
Using the usual notation PQ w, 
the iteration equations reduce to 
Lnwz* da + Unwzdb = 48 XLnwYx, <Xnwrda+ inwddb = +5 +— ro wY. 


éa, 56 are corrections to trial values ay, b). Let day, 5b, be corrections treating the data as 
ungrouped. Then if da, = da—da,, db, = 5b —dby, we have 


2 
Unwz* da, + Unwxdb, = oa {Inw2x?2a,+ Unwx bo}, 


2 
Xnwx da, + Unwdb, = 5g mw + Inwb,}, 


and the solution of these is clearly 


=5j% 8b, = 5 by, 


Finally, if a, 6 are maximum likelihood estimates of data treated as ungrouped, approxi- 
mate solutions for the estimates allowing for grouping are 


Fy a? a? 
@=a(1+5), b= o(1 +S): 


The mean critical age is — b/a = —b/a, while the variance is 


ey ae ee 
a2 a? 24) “a? 12° 


Thus our approximation reduces to the usual Sheppard’s correction for grouping applied to 
the mean and variance estimated assuming ungrouped data.* 


: R 
* If the group width is R, the multiplying factor above becomes (: + a ); giving the usual reduction 


of R*/12. It should be recorded that the analysis of this section originated from a suggestion of E. C. 
Fieller that such corrections would account for the difference between the two analyses. 











16 A note on the analysis of growped probit data 


At first this result is rather surprising. However, it must be remembered that the correc- 
tions to moments introduced by Sheppard are true for the populations, and the use on 
samples is simply an extension of the principle of estimation by equating sample moments to 
population moments introduce by K. Pearson. As all estimates are approximations to the 
true values of parameters, all estimates of moments from grouped data will be approximately 
related to the moments from ungrouped data by Sheppard’s relations. 


NUMERICAL ILLUSTRATION OF APPROXIMATION 


We may apply this approximation to the example used before. Using the same guess position 
a = 0-70, 6 = — 3-70 a normal probit analysis is performed. 

The arithmetic necessary for this is set out in Table 3, using the improved technique 
suggested recently by Fieller. This table contains the complete working for all the necessary 
steps of the iteration, while Table 2 consists of only one step in the iteration for that method. 


This comparison of the two tables emphasizes the amourt of computing this approximation 
saves. 


Treating the data as ungrouped we obtain 
a’ = 0-734, 6’ = —3-848. 
Applying the correction we have 


, 


ea'(14%- b+b'(14%- 
asa‘ +53) = 0-750, = ( +a = 3-934. 
This approximation differs from the correct solution by about 1 part in a 1000 and for most 


purposes would be adequate. However, if greater accuracy is required only one step of the 
exact process is necessary to give the solution 


a=0-7513, 6b = —3-9373 


correct to four significant figures. In obtaining this result it is immaterial whether freshly 


calculated second derivatives or the ungrouped approximations — Xw, — Xw2, — Xwz? from 
Table 3 are used. 


THE CALCULATION OF CONFIDENCE INTERVALS 


At the present stage of development exact confidence limits are not available for the quantiles 
in an ungrouped probit analysis. The usual approximation used depends on the limiting 
normal form of distribution of maximum likelihood estimates and the substitution of sample 
estimates of the parameters into the expressions for the second moments of that distribution. 

Intervals of the same type can be deduced in the grouped case. Those for the quantile 
corresponding to a normal equivalent deviate Y are given (Fieller, 1944) by the roots of 


(Aa® + 2°Lgq) x? + 2(Aa(b— Y) —2*Ly9) x + (A(6— ¥)?+2°L,;) = 0. 


where a, 6 are maximum likelihood estimates, z is the normal equivalent deviate correspond- 
ing to the confidence level required, L,,, Ly, Lg, are 0°L/da*, 0*L/0a 0b, 0*L/ 0b? respectively, 
and finally A = L,, L..— L},. 

These roots are x = {z*L,,—Aa(b— Y)+2z/A[Q—z*]*}/(a? +2°L,,) A, 


where = —{a?L,, + 2a(b— Y) Ly, + (b— ¥)* Lg}. 








K. D. Tocuer 17 


In the example the 95 % confidence interval for the median age is obtained by putting 
Y=0, z=1-96, a=0-7513, 6b = —3-9376, 
Ly, = — 3913-82, Ly, = — 783-214, L_ = — 166-523,* 
A = 383153, 


oa 116346-23703 + 37236-77602 


whence 9966-79 = 3-553 and 6-897, 





measured from 9 years as origin. Thus the median age of the age distribution is between 
12-55 and 15-90. If the data were ungrouped the corresponding interval would be 12-70 to 
15-76. The increase in length of confidence interval with grouping is to be expected as there 
is a loss of information on grouping. 

If the interval is calculated using the grouped values of a and b but the ungrouped values of 
the L’s, we obtain 12-66-15-79 which are still too close. Thus in applying the approximate 
method, one iteration by the exact method at the end is desirable to furnish exact values of 
the L’s. 


SUMMARY 


1. The necessary equations for an iterative solution to the maximum likelihood estimates 


of mean and variance in the underlying normal distribution of grouped probit data are 
derived. 


2. In the case of equal interval grouping a numerical process is described and illustrated. 
3. For the same case it is shown that Sheppard corrections to the estimates obtained from 


treating data as ungrouped are a close approximation to the exact answer, and this is 
illustrated in the previous example. 


4. The usual process for determining confidence intervals is used and the increase in length 
of interval noted. 


The work described above has been carried out as part of the research programme of the 


National Physical Laboratory, and this paper is published by permission of the Director of the 
Laboratory. 


REFERENCE 
Frecter, E. C. (1944). Quart. J. Pharm. 17, 117. 


* These are values obtained at last step of iteration. 


Biometrika 36 








[ 18 ] 


A GENERALIZATION OF POISSON’S BINOMIAL LIMIT 
FOR USE IN ECOLOGY 


By MARJORIE THOMAS, University College, London 


1. It is of interest to field ecologists to estimate the abundance of a given species in a 
commonwealth of plants. The method usually employed for this purpose is that of sampling 
by quadrat. A square lattice—the quadrat—is dropped at random points in the common- 
wealth, and the number of plants of the given species found in the quadrat is counted. 

It is thus possible, with repeated sampling, to form a frequency distribution of the number 
of quadrats containing & plants (k = 0,1, 2,...), and the mean of this distribution gives an 
estimate of the density of the species, that is, the frequency with which, on the average, the 
species occurs in the commonwealth. In describing such observations mathematically, it has 
been customary to assume that the individual plant has no area, and further, that they are 
distributed randomly within the commonwealth studied. Provided the quadrat is large 
compared with the individual plant, this first assumption is justifiable, but the second, that of 
randomness, is recognized by the plant ecologist to be far removed from reality in the case of 
many species. Archibald (1948) has collected material and analysed it to show that for 
a number of species the hypothesis of randomness will not hold owing to the tendency of the 
plants to cluster together. We put forward here a series which will allow for this clustering, 
and which will also enable us to obtain an estimate of it. 

2. It is a characteristic of the observational series collected from quadrat sampling that the 
variance is greater than the mean, a result which is attributable to the clustering of the 
observations. Were the plants distributed at random, it might be expected that Poisson’s 
binomial limit would describe the distribution—as, in fact, it does for some species—so 
a generalization of Poisson suggests itself as appropriate for species in which the clustering 
affects the variance. Archibald (1948) fitted Neyman’s (1939) contagious series to such 
observations, the parameters m, and m, of the distribution being taken as proportional 
respectively to the number of clusters in the data and the average number of plants per 
cluster. 

Another generalization arises from the following set-up. We assume an area over which 
a number of points is distributed at random. With each of these points a random number of 
other points is associated. The area is now divided into squares, and we calculate the proba- 
bilities that a square contains 0,1, 2,... points. Thus if z is the random variable associated 
with the first distribution of points, we write 
e~-™m* 


P{x points in any one square} = ~ a 





Let y be the random variable associated with the random number of points related to the 
first points, so that 
e-A)y 
y! - 
* The notation P{ } is used in this paper to denote probability. Thus the expression P{x points in any 
one square} may be understood to mean ‘the probability that the number of points in any one square is x’. 


P{y+1 points in any one group} = 








nu 


In 


wl 


an 


of 


It 


th 


(re ee ee 


emo oO ae ®& 


he 


Marjorie THOMAS 19 


If now the first points are taken to represent cluster centres, and the second points the 
number of additional plants (after the first) in a cluster, then for any quadrat, 


P{0 plants} = P{0 clusters} = e-, 
P{l plant} = P{1 cluster of 1 plant} = me-"e~, 
P{2 plants} = P{1 cluster of 2 plants}+ P{2 clusters of 1 plant} 


= me-™ AeA + 





m*e = 
=i (e-)2 


—{m+A) 
= “(2A +. me~). 





In general, we have 
k 
P{k plants} = > >P{r clusters having a,, a,, ...,a, plants}, 
r=l1 a 
where a, +@,+...+a, = kand a;>0 forj = 1,2,...,r 
The second summation is over all ates sets of a which fulfil these conditions. Thus 


m*e-™ r Ax eA 


P{k plants} = r= Tl @=i 





k ws —m 1 
en SOT g-OR 
-§ EN ea 


r 


3. We require to evaluate > [] sn For simplicity write 2; = a;—1. Then 
a j=l rT * 


EA= > z (,~—1) = 5— e 


and since a; is an integer greater than zero, we have £;>0 for j = 1,2,...,r 
Now consider the expansion of 
(Pit Pet... +P), 


where p,; = 1 fori = 1,2,...,r. It is clear that 


is a term in this expansion and, writing } to denote summation over all possible sets 
, 


of £, we have 


: 1 


E(k-r)! Te —— = Sk)! +Pot...+p,)P* =r, 
een i @—m ;" ie ps Bi = (it?s i 


It follows that + ete’ k—rp—rA 
P{k plants in any one quadrat} = > oa a a : (1) 
r=1 ‘ ld 





Giving k successive values 1, 2, 3, ..., the required series is obtained. We shall refer to it as 
the double Poisson distribution. As a check we may note that 








o k mtre—m (rA)* Teta © mte-m © (rA)*-e*4 © mte-™ re al 
22a Goo 4 nt Som Se 
which is as would be expected. 


2-2 











20 A generalization of Poisson’s binomial limit for use in ecology 
4. The moments of the double Poisson distribution may be obtained most simply by 
the device just used of inverting the order of summation. For the first moment we have 


7 « k mi e—™ (rA)F-* e-*4 
= 2, zk r! (k—r)! 











© mre-m @ (rA)F¥e-* 
“an ee Ga 
= 3 —_ (rA+ r) 
= m(1+A). (2) 


Similarly, for the second moment about the origin, 











a 8 rae 
q Zi a (e—rP+ ae —o) +9") aT 
= en (m+m) (1 +A)2. 
Converting to the moment about the mean, —__ 
fy = mA+m(1 +A)? 
= m(1+3A+A?). ” 


5. The numerical fitting of the series to observational material can be carried cut in at least 
two ways. The first, and the more usual statistically, is to calculate the mean and variance of 
the observed series and, equating these numbers to the theoretical mean and variance, solve 
for m and A. The disadvantage of this procedure from the ecologist’s point of view is that an 
exhaustive count of all the plants of the species considered in the quadrats is necessary in 
order to bo able to estimate the average number of clusters per quadrat, m, and the mean 
number of plants per cluster, 1+A. As an alternative therefore we may obtain maximum 
likelihood estimates of m and A, following the procedure given by Fisher (1922) and outlined 
by Tippett (1932) for the Poisson series. To obtain these estimates we need a knowledge only 
of the number of quadrats with zero plants, n), the number of quadrats with one plant, 7,, 
and the total number of quadrats, NV. The values chosen as estimates of m and A are those 
which maximize the expression 


L = log, [(e-™)"* (me—™ e—)™ (1 — e-™ — me-™ eA) N—Meo-1], 
and hence satisfy the equations 
aL _ 5 2b 
Om =—SCOAN 
On performing the differentiations, we find that m and A are given by the simultaneous 


equations 
em=n/lN, me =n,/Ny (4) 


which may be quickly solved. It will be noted that these equations are precisely those ob- 
tained by equating the observed and theoretical frequencies in the first two groups. The 











Observ 

quad: 
Poisso: 
Metho: 


Metho 




















MARJORIE Tn OMAS 21 


values obtained may then be substituted in the expression m(1 + A) to obtain the maximum 
likelihood estimate of the average density of plants of the particular species considered. It 
may be noted in passing that, if A = 0, when the double Poisson series reduces to the simple 
Poisson with parameter m, the equation e~™ = n,/N is the same as that found by Tippett, in 
the paper referred to above. This is as expected. Data are supplied by Archibald (1948) for 
Armeria maritima and Plantago maritima, two species which were counted on Blakeney 
Marsh. The theoretical series obtained by each of the two methods are set out below in Tabies 
1 and 2 and are compared with the series given by a Poisson distribution. x? is not significant 


Table 1. Distribution of Armeria maritima 








11 
No. of plants 
per quadrat 0 1 2 3 4 5 6 7 8 9 10 poe: Total 
' 
Observed number of 57 6 12 5 5 5 7 1 —_— 1 1 _ 100 
quadrats 
Poisson: Expectations 20-61 | 32-54 | 25-70 | 13-54 | 5-35 | 1-69 | 0-45] 0-10} 0-02] — | — | — |} 100-00 
Method I: Expectations | 56-44] 5-58] 10-05| 9-56] 6-77 | 4-33 | 2-75] 1-76 | 1-11] 0-68 | 0-41 | 0-56 || 100-00 
Method II: Expectations*| 57-07; 5-99] 10-12| 9-46] 6-51 | 4-09 | 2-56; 1-61 | 0-99] 0-59 | 0-35 | 0-66 || 100-00 


















































fa = 1-58, wy = 5-3636. x2 (method I) = 3-725t, x2 (method II) = 4-039. Estimate of density, m(1+ A) 
by method IT: 1-50 plants per quadrat. 


* The first two frequencies estimated by method II should be exactly equal to the observed fre- 
quencies, but this would entail estimating m and A to a large number of decimal places. We have taken 
them to three significant figures. Calculation of frequencies using the estimated m and A thus produces 
a slight discrepancy. 

+ The suffix attached to x* here indicates the number of degrees of freedom against which its value 
must be judged. Cells containing small expected frequencies were combined, so that no group had an 
expectation of less than 5 units. 


Table 2. Distribution of Plantago maritima 








No. of plants 
per quadrat 0 1 2 3 4 5 6 7 8 9 10 
Observed number of 12 s 9 13 6 8 ll 7 8 7 3 
quadrats 
Poisson: Expectations 0-64] 3:24] 8-17] 13-76| 17-37 | 17-54] 14-76] 10-65] 6-72) 3-77 1-91 


Method I: Expectations 10-99} 6-71] 10-74] 11-22] 10-81] 10-00] 8-80) 7-41/ 6-02) 4-73 3-62 
Method IT: Expectations | 12-00] 7-99] 11-90] 12-10] 11-35] 10-14] 8-59] 6-95) 5-42) 4-08 2-99 


















































No. of plants per ste 
. 11 12 13 14 15 16 17 18 19 | and || Total 
quadrat wean 
Observed number of 4 1 1 — — l — — l —_ 100 
quadrats . 
Poisson: Expectations 0-88] 0-37] 0-14) O-C35) 0-02) 0-01) — — _ — || 100-00 
Method I: Expectations 2-70} 1:97] 1-40] 0-98] 0-67] 0-45) 0-30) 0-20| 0-13] 0-151) 100-00 
Method IT: Expectations 2-15} 1-48] 1-01] 0-67) 0-44 sac? 0-18} 0-11} 0-07] 0-101] 100-00 





fi = 5-05, w, = 14-3875. x3 (method I) = 5-096, x2 (method II) = 7-217. Estimate of density, m(1 +) 
by method IT: 4-58 plants per quadrat. 











22 A generalization of Poisson’s binomial limit for use in ecology 


for the expectations obtained by either method, and we may conclude that the double Poisson 
series describes the material adequately. The simple Poisson distribution is clearly unsuitable. 
The parameters of the series as estimated by the two methods are given in Table 3. Thus for 
A. maritima the mean number of clusters per quadrat is estimated to be 0-573 and the average 
number of plants per cluster 2-755, using the first method; but it is clear that if maximum 
likelihood estimates are obtained from the first two groups only, the difference is small. This 
difference is a little more marked for P. maritima—a point not to be wondered at when we con- 
sider the irregularity of the observed frequencies—but even so it is not of sufficient magnitude 
to invalidate any estimates of the number of plants in a given area made by using method II. 


Table 3. Comparison of two methods of estimating parameters 











Armeria maritima Plantago maritima 
Method I Method IT Method I Method ITI 
m 0-573 0-562 2-209 2-121 
1+A 2-755 2-675 2-286 2-159 























6. The technique used to obtain estimates of m and A in method IT, may be carried a stage 
further to give the large sample standard errors of these estimates, which are of use in 
deciding when this comparatively simple method may be used without introducing too great 
an error. To obtain these standard errors we form the second derivatives of L, where L is 
the quantity defined above, and then solve the equations 


é(o 3) = yaa Pe tS MP Pe eS 
om?) —s 1 — p* a?’ fo ~ 1l—pto,o,’ as) --i-aar 





where ¢,,,, 0, are respectively the standard errors of the estimates of m and 1 + A, and pis the 


correlation between them. On performing the differentiations and solving the equations so 
obtained, we reach the results 





1 /l-e™ 
2 S=—_— | —_— 5 
om n( e-m ), ( ) 
—eg-mo-A —A 1—m)? 
2 _ m—e™me*+e~“( 6 
. Nme-™e-4 ; (6) 


where the parameters m and A on the right-hand sides of these equations have their true 


population values. These values will generally be unknown, so that sample estimates must be 
used. We then obtain 


4-3-1) c 
a CF]. 





(8) 


where m = — log, (n/N). Clearly the standard errors are very large if either n/N or n,/N is 
very small, and the method breaks down if either n, or n, is zero. Tables 4-7 show the relation 
between n,/N and n,/N and (a) m and 1+A from equations (4); (b) o,, and 7, for N = 100, 
from equations (5) and (6); (c) o,,/m and o,/(1+A), for N = 100. 











MarRJorIE THOMAS 23 


If the tables are entered with the observed values of n,/N and n,/N, we obtain the estimates 
of m and 1 +A and of their standard errors. It is seen that the absolute values of o,, and a, 
decrease as ,/N and n,/N increase. The relative values of the standard errors, however, 
behave in a rather different way, decreasing to a minimum and then increasing again. 






























































































































































Table 4. Values of m, o,, and o,,/m for N = 100 
| 
n/N 0-05 | 0-10 {| 0-20 | 0-30 | 0-40 | 0-50 | 0-60 | 0-70 | 0-80 | 0-90 
m 2-996 | 2-303 | 1-609 | 1-204 | 0-916 | 0-693 | 0-511 | 0-357 | 0-223 | 0-105 
C., 0-436 | 0-300 | 0-200 | 0-153 | 0-122 | 0-100 | 0-082 | 0-065 | 0-050 | 0-033 
o,,/m 0-145 | 0-130 | 0-124 | 0-127 | 0-134 | 0-144 | 0-160 | 0-183 | 0-224 | 0-317 
Table 5. Values of 1+A 
2 | | 
\u/N . 
x 0-05 | 0-10 | 0-20 | 0-30 | 0-40 | 0-50 | 0-60 | 0-70 | 0-80 | 0-90 
n,/N ie 
— Be ob: —_—_— 
0-05 2-097 | 2-527 | 2-862 | 2-977 | 2-992 | 2-936 | 2-814 | 2-609 | 2-272 | 1-637 
0-10 1-404 | 1-834 | 2-169 | 2-284 | 2-299 | 2-243 | 2-120 | 1-916 | 1-579 — 
0-20 — 1-141 | 1-476 | 1-591 | 1-605 | 1-550 | 1-427 | 1-223 — = 
0-30 — — 1-070 | 1-186 | 1-200 | 1-144 | 1-022 — — = 
Table 6. Values of o, for N = 100 
} | | 
Me ae ~ | 0-05 0-10 0-20 0-30 0-40 | 0-50 0-60 0-70 0-80 0-90 
1/2 
0-05 0-536 | 0-480 | 0-451 | 0-441 | 0-434 | 0-428 | 0-421 | 0-410 | 0-388 | 0-317 
0-10 0-433 | 0-361 | 0-321 | 0-307 | 0-297 | 0-288 | 0-27 0-261 | 0-225 oa 
0-20 — 0-283 | 0-231 | 0-210 | 0-196 | 0-182 | 0-164 | 0-134 — = 
0-30 | — — 0-191 | 0-166 | 0-147 | 0-128 | 0-101 — — — 
| | 
Table 7. Values of o,/(1+A) for N = 100 
= My/N | | 
i a | 0-05 | O10 | 0-20 | 0-30 | 0-40 | 0-50 | 0-60 | 0-70 | 0-80 | 0-90 
VV as 
0-05 0-256 | 0-190 | 0-158 | 0-148 | 0-145 | 0-146 | 0-149 | 0-157 | 0-171 | 0-193 
0-10 0-309 | 0-197 | 0-148 | 0-134 | 0-129 | 0-129 | 0-131 | 0-136 | 0-143 — 
0-20 — 0-248 | 0-156 | 0-132 | 0-122 | 0-117 | 0-115 | 0-109 — _— 
0-30 — — 0-179 | 0-140 | 0-123 | 0-112 | 0-099 — — = 























It will be noted that values of 1+A and oc, are given for a somewhat curtailed range of 
n,/N. The reason for this is that as n,/N increases for fixed n/N, the equations (4) yield 
successively smaller and eventually negative values of A. As a limiting case we have the 
situation where the two ratios are exactly equal to the first two terms of a simple Poisson 
series, that is, a special case of the double Poisson with A = 0. Although it is possible that the 
double Poisson series, with A taking negative values, may provide a ‘graduation’ for 











24 A generalization of Poisson’s binomial limit for use in ecology 


observational data, the parameters in this case lose their physical significance, and the 
situation has therefore not been further considered here. If the true population A is zero or 
has a small positive value, we may expect that maximum likelihood estimates of A will 
sometimes be negative, owing to sampling fluctuations. It follows that estimates of A which 
are near to zero should be treated with caution, even though the standard errors of such 
estimates may not be large. 

7. We have still to compare the relative accuracy of methods I and II in the estimation of 
the population mean, since this is the quantity with which the ecologist is primarily con- 
cerned. By method I the mean of the sample is taken as estimate of the population mean, 
with a standard error ./(“./N), where y, is given by equation (3). 

Using the maximum likelihood method, the simplest way of estimating the population 
mean is to estimate m and | + A from equations (4), and then to form the product M = m(1 +A). 
The large sample standard error of © is then given by 


Og = mah + (1 +A)? 02, + 2m(1 +A) po, o,. 


On substituting values already found in the right-hand side of this equation, we reach the 
result 





_ 1[(m—A-2)? m? ‘ 
oi, = vl. ee (A +2)? |. (9) 
As before, this may be estimated by 


1[(m—A-2)2 m? 
ee a 2 1 
where for m and A we substitute the estimates obtained from equations (4). 
(a) Table 8 shows values of M and (6) Table 9 values of the standard error of the maximum 
likelihood estimate of M for N = 100* (from equation (9)), in terms of the population pro- 


portions in the first two groups. These tables may also be used to give estimates of M and o yy 
based on the observed frequencies n, and 7. 





Table 8. Values of the mean M 


IN 
oe 0-05 | 0-10 | 020 | 0-30 | 0-40 | 0-50 | 0-60 | 0-70 | 0-80 | 0-90 
1 


\ 








0-05 6-283 | 5-819 | 4-606 | 3-585 | 2-741 | 2-035 | 1-437 | 0-931 | 0-507 | 0-172 
0-10 4-206 | 4-223 | 3-491 | 2-750 | 2-106 | 1-554 | 1-083 | 0-683 | 0-352 
1 
1 












































0-20 —_ 2-627 | 2-375 -916 | 1-471 | 1-074 | 0-729 | 0-436 — mie 
0-30 = — 1-722 -427 | 1-099 | 0-793 | 0-522 —~ _— _— 
| 
1 (m—A—2)? m? 
Table 9. Standard error arf | Ss a ~ (A+ 2)? | for N = 100 
hy T a. 





n,/N 
we 0-05 0-10 0-20 0:30 0-40 0-50 0-60 0-70 0-80 | 0-90 
1 


0-05 1-304 | 1-042 | 0-789 | 0-623 | 0-495 | 0-389 | 0-298 | 0-215 | 0-138 | 0 
0-10 0:954 | 0-691 | 0-529 | 0-426 | 0-342 | 0-270 | 0-205 | 0-146 | 0-089 
0-20 _— 0-471 | 0-325 | 0-264 | 0-213 | 0-167 | 0-124 | 0-082 
0-30 —_ — 0-233 | 0-181 | 0-144 | 0-110 | 0-077 — 





lille 









































* The standard error for a sample of size N, is obtained by multiplying by 10//N,. 








MarJorre THOMAS 25 


Table 10 shows the ratio of (i) standard error of estimate M = m(1 +A) based on the mean 
of the complete sample, to (ii) standard error of maximum likelihood estimate, that is, of 
#2 from equation (3) to 7, ./N from equation (9). This ratio is independent of N, but it is, of 
course, a large sample limit. The figures in this table indicate what loss in accuracy must be 
faced for the sake of gain in speed resulting from a count of only the first two frequencies, 
m, and n,. For instance, for the standard error of the more approximate method to be no 
more than about $ times that of the more accurate, the distribution should be such that at 
least 60 % of the quadrats contain one or no plants. The table also shows the relative number 
of counts needed for a given accuracy; thus, for example, if the expected proportions are 
n/N = 0-20, n,/N = 0-30, then (1/0-60)? = 2-8 times as many quadrats must be counted 
using method II as using method I for roughly the same accuracy in the estimate of m(1 + A). 

It is for the ecologist to decide on the balance to strike between accuracy and field labour. 


Table 10. Ratio of standard errors of estimates of M, (i)/(it): 
(i) from moment solution, (ii) from maximum likelihood solution 





n,/N 0-05 | 0-10 | 0-20 | 0-30 | 0-40 | 0-50 | 0-60 | 0-70 | 0-80 | 0-90 
1 





0-05 0-311 | 0-410 | 0-510 | 0-580 | 0-640 | 0-695 | 0-749 | 0-805 | 0-867 | 0-947 
0-10 0-280 | 0-450 | 0-581 | 0-657 | 0-718 | 0-772 | 0-824 | 0-877 | 0-935 — 
0-20 —_ 0-387 | 0-635 | 0-736 | 0-801 | 0-855 | 0-906 | 0-956 — — 
0-30 — —_ 0-601 | 0-766 | 0-850 | 0-911 | 0-963 — —_ — 









































8. We may conclude that the double Poisson series may prove useful for the description of 
plant distributions. We have attempted so to design the mathematical set-up that the para- 
meters of the derived distribution may be capable of physical interpretation. Sufficient data 
are not available to test whether the parameters m and | + A do, in fact, measure the average 
number of clusters per quadrat and the average number of plants per cluster, respectively, 
but the good-fit obtained for the two series discussed above suggests that for these series, at 
least, the mathematical model provides an excellent graduation. Further, if this series can be 
used in other cases where the maximum likelihood method of estimation is satisfactory, the 
labour of the field ecologist will be considerably lightened in that a complete enumeration of 
plant numbers need not be carried out. A knowledge of the zero and unit classes is all that is 
required. The tables provided should be helpful, both in estimating the parameters and in 
deciding which method of estimation should be used. 


I wish to thank Prof. E. 8. Pearson and Dr F. N. David for suggestions and help in 


the preparation of this paper, and Miss E. E. A. Archibald for supplying me with data 
for the numerical illustrations. 


REFERENCES 


ARCHIBALD, E. E. A. (1948).. Ann. Bot., Lond., N.S. 12, 221. 
FisHer, R. A. (1922). Philos. Trans. A, 222, 309. 

Neymavy, J. (1939). Ann. Math. Statist. 10, 35. 

Tierett, L. H. C. (1932). Proc. Roy. Soc. A, 137, 434. 








[ 26 ] 


THE ESTIMATION AND COMPARISON OF RESIDUAL REGRESSIONS 
WHERE THERE ARE TWO OR MORE RELATED SETS 
OF OBSERVATIONS 


By A. H. CARTER, King’s College, Cambridge 


1. INTRODUCTION 


The problem to be investigated concerns a series of parallel samples each comprising observa- 
tions in two or more variates, the corresponding members of the different samples having 
certain elements in common. Such a situation would occur in the case of (a) successive sets of 
measurements on the same animals or plants or experimental plots, or (6) varietal trials where 
a complete experiment (involving each of the varieties to be tested) is repeated in a number of 
districts. The essential feature is that the different samples comprise observations relating to 
the same underlying material, be it animals, or plants, or plots. It is desired to derive esti- 
mates of the residual regression effects where the effect on one measure of a number of others 
is being studied, and to develop tests of their homogeneity as between samples. Since the 
samples are non-independent, the usual tests of homogeneity of regressions among inde- 
pendent samples (Bartlett, 1934; Welch, 1935), willbe no longer applicable. Further, in so far 
as it is desired to ascertain and compare the net regression effects, after elimination of the 
effect of the underlying common elements, the estimates of the regression coefficients them- 
selves will differ from those generally computed. 

As an example of the type of problem to be considered, suppose we wished to compare the 
regression of sugar content on yield for several varieties of sugar beet. Replicated plots might 
be set up in each of a number of different localities, with one variety to each locality. Regres- 
sion coefficients for each variety could then be tested for homogeneity by the usual methods, 
the samples being independent. Obviously, however, any conclusions drawn from such an 
experiment would be of little value, since varietal and regional differences would be con- 
founded. Suppose, instead, neighbouring plots in each district were allocated to the several 
varieties, one or more plots per variety, the experiment being repeated in a numberof districts. 
If region-variety interaction could be assumed negligible, inherent differences in soil and 
climate between localities would affect each variety to the same extent. Any conclusions 
now drawn would clearly be of general application. On the other hand, since every district 
contributes one or more pairs of observations to each varietal sample, the samples will be no 
longer independent, and the standard methods will fail. 

The problem of testing for significance the difference between the regression coefficients 
from two correlated samples was first investigated by Yates (1939). On the assumption that 
the samples came from two populations in which the residual of the dependent variate (after 
allowing for the effects of the regression) was normally distributed, he demonstrated that the 
difference between two simple or partial regression coefficients was itself normally distributed 
with variance expressible in terms of the residual variances and covariance of the two popula- 
tions; and showed how these quantities might be estimated from the data. Since, however, 
the estimate of the variance of the difference between the regression coefficients was based on 
an unassigned number of degrees of freedom, the resulting test was only an approximate one; 








A. H. Carter 27 


its results must be interpreted with care in the case of smail samples. The tests developed 
below are exact tests, and, moreover, are applicable to the case of any number of samples. 

In the course of an experiment by the Wool Industries Research Association (Galpin, 1947), 
Daniels applied the method of fitting constants to the present problem. In the experiment, 
measurements of two external characteristics of a number of sheep were recorded at three- 
monthly intervals. The problem was to determine whether the relationship between the two 
variates changed with the season. 

The method employed below to derive the estimates and significance tests in the general 
case is also that of fitting constants. This technique, of wide application to statistical 
problems, is particularly useful in those cases where the magnitudes of certain effects require 
to be estimated, and the effects tested for significance. In all cases where it is employed, the 
method is of value in indicating clearly the basic assumptions which have been made. The 
observed values are presumed drawn from a population whose form—the “model’—is 
specified, depending on a number of unknown parameters. The constants to be fitted are the 
estimates of these parameters. Where the residual variation in the population is assumed 
normal, as will be the case throughout this paper, the fitting of constants by least squares will, 
of course, yield the maximum likelihood estimates. 

The general theoretical problem is first formulated, and a general result forming the basis 
of the tests of homogeneity developed later is considered. We proceed to derive specific 
formulae, in convenient form for calculation, for the general case of » correlated samples 
with g independent variates. The special cases of p = 2 and of g = | are then discussed, and 
the known results for independent samples briefly deduced and compared. After a discussion 
of the underlying assumptions made, and a comparison of the method with that proposed by 
Yates, the paper concludes with two examples by way of illustration. 


2. DIscUSSION OF METHOD 


The general case to be considered is that of p samples or ‘lots’, each of n observations in the 
(q+1) variates y,2,,%,,...,%4. Denote the jth set of observations in the ith sample by 
(Yi; Tues) (U = 1,2,...,q; 4 = 1,2,...,p; j = 1,2, ...,m). The np sets may be considered as an 
n x p array, the p columns corresponding to the samples. These are correlated owing, it is 
supposed, to a common effect—the ‘ correlation-effect —running through their jth members, 
i.e. through the jth row in the array considered 


We regard the p ‘lots’ as samples from p related subpopulations in which the values of y;; 
are assumed specified by 


qa 
Yij _ Z Prstuig + YE + OF + HE + Cys (2-1) 
u= 


where f,,, (u = J, 2, ...,q) measures the true regression effects in the ith subpopulation, 
y? measures the ‘lot’ effect (column-effect) common to the members of the ith sub- 
population, 
a} measures the ‘correlation-effect’ (row-effect) common to the jth members of all 
subpopulations, 
u* denotes the general (population) mean, 


and ¢,; is a random residual, normally distributed about zero mean with constant (un- 
known) variance o°. 








28 The estimation and comparison of residual regressions 


In the sugar-beet experiment cited above, for example, the different varieties constitute 
the ‘lots’, the column-effect would be that due to variety, and the row-effect that due to 
locality. 

It is required (i) to obtain estimates b,,; of the partial regression coefficients £,,,, (ii) to test 
hypotheses of the type £,,; = #,, (¢ = 1,2,...,p), ie. to test the homogeneity as between 
samples of the partial regression coefficients, and (iii) to derive a test of the significance of the 
difference between any two regression estimates, say those of y on ~z,, for the /th and mth 
samples, the hypothesis to be tested being 


Bia = Bum (lm). 

In view of the assumed relation (2-1), the tests under (ii) and (iii) above are particular cases 
of the test of a general linear hypothesis (Kolodziejcezyk, 1935). In the present case, the likeli- 
hood approach yields a test criterion depending on the quantities S, and S,, where S, is the 
absolute minimum (i.e. for variations in all the parameters of (2-1)), and S, is the relative 


minimum (i.e. taking account of the conditions implied by the hypothesis under test) of the 
sum of squares 


Dp n 
z= > > (Y3 — 5)", 
i=1 j=1 


q 
where May = Ely) = F Bui Tug + VE + OF + 2*. (2-2) 
u= 


It will be found convenient, however, to adopt a slightly different approach, and to regard 
the problem entirely as a formal multiple regression one. For this, ‘dummy’ variates will be 
introduced, as explained below, to carry the parameters yj, aj, «* in (2-2). (For the use of 
dummy variates in this connexion see, for example, Bartlett (1933) and Yates (1933).) 


Using this approach, all our required tests of homogeneity are particular cases of the following 
general result. 


To test the homogeneity of a group of the partial regression coefficients in a single sample. 
Suppose the regression of y on ¢ independent variates 2x,, 2, ...,2, is linear so that the 
expected value 7 of y is 
9 = Bt + Botyt... +B, %,+ Bey Bpyrt --- + Bim (2-3) 
(the variates being measured from their means). (y— 7) is then assumed normally distributed 
about zero mean, with unknown variance o*. It is required to test, on the basis of a sample of 
size n, the hypothesis, H), that 2, = 8, =...= 8, = Bo, say. This is clearly a linear hypothesis 
of order (r—1), and the appropriate test follows from Kolodziejczyk’s general theorem. 


Denote the jth set of observations by y;,2,,,%9;, ...,% (j = 1,2,...,m), so that, the 2’s 
being assumed errorless, 


a E(y;) = 05 = By Xj + Ba%e;+ -.. + Byty. 
Write = for > (y;—7,)*. Let S, be the absolute minimum of = for variations in all ¢ para- 
j=1 
meters (the f’s), and let S, be the relative minimum of = under the conditions of H,, i.e. of 
2D’ say. 
Then, provided H, holds, the two members on the right-hand side of the identity 
8,=8,+ (S,—8,) (2-4) 
are distributed independently as x*a* with (n—t) and (r—1) degrees of freedom respectively. 





whe! 


in 


_— SS a ae 





A. H. Carter 29 


The hypothesis may therefore be tested by taking the ratio of the mean squares 
(S,—S,)/(r—1) and 8,/(n—t), 
which will have Fisher’s variance-ratio distribution for (r — 1) and (n —t) degrees of freedom, 
if the hypothesis is correct. __ 
In particular, S,, the minimum of = for variations in £,, Ag, ..., £;, is given by 


n 
8, => % (y; — 6,24; eee b,%4;)*, 
where the 6’s are the sc ~1tions of the normal equations 
n 
= Xy(Y; — 6,2; — ... —b,%,3) = 0 (k= > ae: G 
j= 


That is, 8, is the residual sum of squares for the sample regression of y on 2, Xs, ..., 2}. 
Similarly, S, is the minimum of 


x’ = *, (Yj; — BoX1j — Bo%ej— --- — Bors — Brrr %eanj— +> — B,%;)° 


= ~ (¥; — BoXo3 — Bra %eans — --- — Br%y)*, 


where z;= > 2,,;, for variations in £,, , ---> fy. That is, S. is the residual sum of squares 
oj 1 wij ® Pr+i1 t r 
w= 


for the sample regression of y on Xp, 3, ---, X}- 

With these interpretations of the quantities S, and S,, the identity (2-4) yields a ready 
practical test, in analysis of variance form, of the hypothesis H), i.e. of the homogeneity of 
b,, bg, ...,6,. The estimate s® of o® is, by the usual methods of regression analysis, S,/(n —?). 

As a corollary to the foregoing general result, we may derive a test of significance of the 
departure from zero of a group of the partial regression coefficients. The hypothesis, Hj say, is 
that £, = £, =...= £, = 0(i.e. that &, = Oin H,), a linear hypothesis of order r. 8, is as before ; 
we require the relative minimum Sj, say, of X’ when £, = 0. S; is, in fact, the residual sum of 
squares for the sample regression of y on 2,5, %,495 ---» Xp 


On the hypothesis H), the two members on the right-hand side of the identity 
8'=S,+(S!—S,) (2:5) 
are distributed independently as x*0* with (n —t) and r degrees of freedom. 
Hence, as before, the hypothesis may be tested by referring the ratio of the mean squares to 
the variance-ratic distribution. 


It may be noted at this stage that equation (2-2) may, without affecting the ’s, be written 
in the form 


q 
ny = 2 Fuituig + Ve + yp (2-6) 
where %=Y¥i-y* (&=1,2,...,p), 
a, =aft+yt+y* (j =1,2,...,n) 
1? 
and yYr=- > ¥. 
P i=1 , 
Pp 
Since > y; = 0, there remain in addition to the /’s only (n + p— 1) independent constants 
i=1 


to be fitted. 











30 The estimation and comparison of residual regressions 


3. GENERAL CASE OF p CORRELATED SAMPLES WITH g INDEPENDENT VARIATES 
As before, denoting the observed values by 
(Yijs Taig) (U = 1,2,...,9; 0 = 1,2,...,95 7 = 1,2,...,0), 


the y,; are assumed distributed according to the specification (2-1). In virtue of (2-6), this 
may be written 


q 
Yi = a Bus Xuig + Vit % +6€;;. (3-1) 
u= 


Now ny = Bui%uig + Vir %; 
~1 


q 
2 
aq »p 

= 2 Bur2pu- D+kig + x Vi%*ensnig + : 2+ Mapipir— Dij» (3-2) 


u=ji 


where the z’s are ‘dummy’ variates taking the specified values 


=1 (=), 
= Luj (0 6 k), _ 
2y(u—+k) ij ‘ Aqp+ij) — 0 (¢ a l), Aqp+p+r—D ij 
=0 (+k), . = 
=-l (a bis P), 
(« = 1,2, ...,¢;8, 2 = 1,2,...,9;8 = 1,2,....9—1; 7 = 1,2,...,2) 
That (3-2) is equivalent to (2-6) may readily be verified by inserting any particular values 
of i,j. 
The least squares estimates b,,;, c;, a;, of the constants in (3-1) are obtained by minimizing 


D=U(yi;—75)*-T From (3-2), it follows that b,,, bys, ..., bgp, Cy, «++» Cpa» Ay, +++) Uy, are in fact the 
$7 


estimates of the partial regression coefficients of y on the (qp + p +” — 1) independent variates 
21) 29s +++ Zaps Zqp ear +++» Zqp+-p—1» Zqptp> +++» Zqp+psn—i Fespectively, there being in all 

pn (st = 1,2, ...,9; j = 1,2, ..., 2) 
observations in each variate. In terms of ordinary multiple regression analysis there will be 


(gp +p +n-—1) normal equations which we require to solve for our (¢p + p+ — 1) estimates. 
We now define the sample quantities 


uvik 


- 1 
Puvik = = 2ltuas— Lui ) (ox; aaa Lyx.)> Pruvik ee (4. -*) F 


, 1 : , 
Quik = Luyij boat Lyi.) (Yuj — Ye.) Quik - (t0-=) Quix (wu, v ~ 1, 2, +09 i,k a 1,2, -++y DP), 





1 
Ry = si —Y;) (Yxj —Yx), Rix = (4-5) Ry 


=1 =%4% n 
where bul a ia hs , and the dot in the subscript denotes the mean, e.g. 2, = 3 DX Luiz: 


Eliminating the c, and a; from the normal equations, we obtain after some reduction the 
set of iaeiacis for determining the b,,,: 


q 1 2? ’ 
>» bus Pot— = > bu P, uvki = Qui = >> Quix (v = 1, 2, 9G, t= 1, 2,...,p) (3-3) 
u=1 =1 k=1 P k=1 
Pp Pp 
or > DS oun Prot = 2, Qrik: (3-3’) 
u=1 k=1 = 


+ For simplicity, 2, r, will be written for £, >> throughout. 
t=1 j=1 





In 


samp 


or 


deg 





A. H. CarTErR 31 
In the notation of §2, S, is, in the present case, equal to 5 D(y;;— Y,;)*, where ¥,, is the 
sample estimate of 7;;, i.e. : 


qd 
Y= ~ bye Xuig + Cg + Gy. 
u= 


After reduction, and using the relations (3-3), we obtain 
qa 1 2? q 
EX 4—Yo!* = E(Ru- ¥ bac Que)—5 EE (Ru ¥ bu Que) 
ij i u=1 / Pi,k=1 u=1 


= &(Ru- BE bub Poot) - 5% E (Ra- EZPubaePowa)> (84) 


u,v=1 


or TX(yi5— Yi)? = =z (Ria > bv: Qu) 
ij i, k=1 u=1 


= BE (Bia- EE bide Per): (3-4’) 


i,k=1 
We note in passing that, since S, is in this case based on 
{np —(qp + p+n—1)} = {(n—1)(p—1)— pq} 


degrees of freedom, the estimate of the residual variance o? is 
= TX(ys— Kiz)?/{(m— 1) (p— 1) — pg}. (3-5) 
7 


It is required to test the homogeneity as between samples of the respective partial re- 
gression coefficients of y on x,, 22, ...,%,- Consider those on z,, the hypothesis, H, say, to be 
tested then being that £,, = £,, = ...= £,, = £}, say. For the multiple regression of y on the 


(qp +m) independent variates 2’, z,,,,,z 


Pp 
’ — ‘4 
42> -*-> 2gpepen—1» Where 2; = 2) Zpij =X 4;, We have 
for the expected value 9}; of y;;, 
ee Pre > ae - 
is = Bi. 25+ 2. 2% Pax tintuvemrigt 2 YVi%epswig + ~ Xp Aop+p+r—2 ij 
u= = = r= 


Suppose the sample estimates of /1,, f.,, yj, 2; and 4;; are respectively bj, 6.,., ¢;, a, and Yj;. 
The normal equations, after eliminating the c; and a, then reduce to 


s- } 1? 
04, (Paw 5 5 Pass) +E ge = B (Row, P 4 bin P, ux) = Hous 8 >> oa) 





‘ 12 q 3-6 
05 (Pre or >> Pons) + 2 (0. Pout oe sn Patt) = (Q.u-5 3 ai 3 Que) ( ) 
Pr=1 =2 k=1 
(v= ie cat ¢ = 1,2, ...,p), 
P , - £ , P , 
or >» x (oP nk + = bux Pass) = 22 Qik 
i, k= : P: 1 (3°6)’ 
> (6. Proxit Yr Peas) = 2 Qoix (v = 2,3,...,¢;8= 1, 2, ..., p). 
=} out = ‘ 


The regression estimates bj , b,; (u = 2,3, ...,q;i = 1,2, ...,p) are obtained as the solutions of 
these {p(q— 1) + 1} simultaneous equations. 











32 The estimation and comparison of residual regressions 


The residual sum of squares for this regression is LUWs— Y;,;)?, corresponding to the S, of 
(2-4). Proceeding as before we obtain 


qa 
LT (¥y— Yu? = = >> Ri— 5, Qie- = Uc ua] 
tj i, k=1 u=2 


Pp q q i tia 
= p> (Ris — bP Pirie — 204, x, bus Prate — Pp») bu: Ove Prete) - (3-7) 
If the hypothesis H, is true, this sum yields an estimate of o? based on 
{np —(pq+n)} = {n(p—1)—pq} 
degrees of freedom. To test the homogeneity of the b,; (¢ = 1, 2, ..., p), therefore, we consider 
the identity corresponding to (2+4): 


EXy— Yol= p> (vy—Yo)" + (23 vy- Yih EE y— YI, (3-8) 


whose members have {n(p—1)— pq}, {(n—1)(p—1)—pgq} and {p—1} degrees of freedom 
respectively. The ratio of the two mean squares for the right-hand members may be referred 
to the variance-ratio distribution, as in the case of a simple analysis of variance. 

In similar manner we may test separately the homogeneity of b,;, bs;, ...,69;(¢ = 1, 2, ..., p). 

Suppose now it is desired to test the overall homogeneity of all the 

Diss Ogis---vbg¢ (6 = 1,2,...,p) 
together, that is to test the hypothesis, H, say, that 
Bu=Py. say (w=1,2,...,9¢;¢=1,2,...,p). 

This is, of course, equivalent to testing the homogeneity of the multiple regressions in the p 
correlated samples. We require the residual sum of squares for the ‘multiple regression of y on 


the (¢+ p+n-— 1) independent variates z}, 23, ..., 2, Sensi? ->+Mepipta—1» Where 


Pp 
, € 
euij = 2 2ouv+i0 45 = Lyij (u => | 2, — 3 


Proceeding as before, the equations for determining the estimates b, of £,, are 


1 2 ee. 
>> = b,, (p uvii~ — p x Pao) = ¥( Que 2 >> Qu) (v = 1,2, o+09Q)> (3-9) 
i u=1 ° k=1 i Pr=1 
Pp 
or pp» > b, om “= 2» ~ Qoik (v = 1, 2, a (3-9’) 


Denoting the sample estimate of the “mia value of y,; for this regression by Y%,, the 
residual sum of squares is 


p q vp 
EEWy— FE) = VE (Ria- Bb. Gea) = BE (Rie VE Yuba Prose) (8-10) 
ij i, k=1 u=1 i, k=1 u, v=1 
On the hypothesis H,, this sum provides an estimate of o* based on 
{np —(q+p+n—1)} = {(n—1)(p—1)-g} 
degrees of freedom. The appropriate identity, corresponding to (2-4), for testing the hypo- 


thesis H, is . . 
EE Wy Fy = Pp vy— Yul + Pp (yy— Y4)*- EE Wy-Yur, (3:11) 
+ t : a | 


with {(n—1)(p—1)—q}, {(n—1)(p—1)—pg} and {q(p—1)} degrees of freedom respectively. 
On substitution from (3-10) and (3-4’), and simplification of the last member, the identity 
may be written 


EE (Bin~ DE Pubs. Poor) = xE (Ra- Ze  vabar Pow] 


u, v=1 u, v=1 


+ XY Pp (bus—5u.) (bur —%0,) Puvie: (3°12) 


i,k=1 u,v 





The 
the ps 
deter 


regre: 
that , 
follov 

Re 


squal 

















A. H. CarTER 33 


The formulae developed for the cases H, and H, above provide tests of the homogeneity of 
the partial regression coefficients in correlated samples. As particular cases, we may wish to 
determine whether the partial regression of y on a particular variate, x, say, or the multiple 
regression of y on all variates, is significant at all. The corresponding nul-hypotheses are H;, 
that £,,=0 (¢ = 1,2,...,p) and Hj, that #,,=0 (u = 1,2,...,q; i = 1,2,...,p), and the tests 
follow from the corollary to the general result of § 2. 


Referring to the regression equation (3-2), it is easily shown that the residual sum of 


squares for the regression of y on Z,,, 4,2 19) ---»2gp4pin—1 18 


qa Pp 
EX Ws- You = NE ( a Sec Qu) = ry (Ris. - EE bbe Pe wt) (313) 
ij u= 


i, k=1 i, k=1 


u, v=2 
7. -. ” , P. , ° 
where > ae wk Pueki => Qrik (v = 2,3,...,9; t = 1,2,...,p), (3-14) 
u-2 k=1 k=1 
and the residual sum of squares for the regression of y on 2,1, Zgp42) +++» 2qp+-p+n—1 18 


p 
XX (Y¥5— Yes)? = DU (Ys—-¥i.-9. 5+ ¥.? = “ = Rie we 
ij ij i, k= 
Hence the appropriate identities corresponding to (2-5) are 
for Hi EE (iy~ Venu) EE Wy—Yo + BE Ou— You VEWy- Ma}. 16) 
ij = | 


with (x — 1)(p—1)— p(q—1), (n—1)(p— 1)— pq and p degrees of freedom respectively; 


for H3;: Dd (¥i5- Y@ij)?= EE (¥i5— Ys)? +22 Uy Yang) 2E Wy Yul| 
tj J 


p Pp Pp q 
. , o- “ he) he) , 
or Xd Re= TX (R; ik D'S byibex Preit) + +( - St Aa Prax) (3-17) 
t,k=1 i, k=1 39-1 i,k=1 uyr=1 


with (x — 1)(p—1), (n—1)(p—1)— pq and pq degrees of freedom respectively. 
The relevant tests of significance of the regressions and of their homogeneity may be 
combined in an analysis of variance table. Thus, for testing the hypotheses H, and Hj, we have: 








oe Degrees of Mean , 
Source of variation i Sum of squares 
freedom Sum of sq square 


























Pp a 
Due to pooled regressions q pS Se 6..6,. Poser 
i, 1 wer-1 
Differences between separate and sey vy »” 2 
pooled regressions up—l) kel SE bu bu Mee be)Previe| 
. P . a . , 
Due to separate regressions Pq VE VD bude Paver 3 
i,k=1luv=1 
aie , . aS ~& hea 
Deviations from separate regressions | (n — 1) (p —1) -- pq VE Re — DL bybee Povss a 
i,k=1 u,ve=1 ) 
2, . , 
Total (n—1)(p—1) LX Rin 
i,k=1 























Biometrika 36 








34 The estimation and comparison of residual regressions 


To test the significance of the multiple regressions (cf. (3-17)) we test the variance-ratio 
s3/s*, with pq and (n—1)(p—1)—pq degrees of freedom. To test the homogeneity of the 
multiple regressions, the appropriate variance-ratio (see (3-12)) is sj/s*, with q(p—1) and 
(n—1)(p—1)— pq degrees of freedom. 

There remains the problem of estimating the separate variances and covariances of the 
individual partial regression coefficients, which are required in testing the significance of any 
particular one (from an assigned value) or of the difference between any two. Regarding 
buis %, @;, a8 the estimates of the partial regression coefficients in the regression equation (3-2), 
we obtain in the usual manner: 

Gener maeres Of by: = 8°79 ptu—D+i, plu—v4i Jiu. mt hae ee eee 
estimated covariance of 6.6.4 = 8°9u—»+i, plo—D+k (3-18) 
where s? is given by (3-5) and (g, ,) (A, = 1,2,...,gp+p+m-— 1) is the inverse of the (sym- 
metrical) matrix (f,, ,) of coefficients of d,,;, ¢, a; in the normal equations derived from (3-2). 
The determinant | f,, ,| = F say, reduces to 

F= nP-1 prt D, 
where D= | btwn +4, plo—2+k | and du—v+i, plv-D+k = | 
The cofactor of fu _1)+4, p(v—1)+% 18 Similarly found to be 

Fyu—v+4, plows = 1p" Dyu_1+3, plo—D4ke 


Hence 4g = PUD, lo—D+, 


9 plu—1)+i, plv—1)+ 
or we may write (9 ptu—1 +4, plr—D+k) sis fa. (3°19) 

It is known that, under the assumptions of normality and no error in the 2’s, the 6,; are 
normally distributed (for each b,; may be expressed as a linear function of a number of 
independent normal variates) about means f,,;. Hence given the estimates of their variances 
and covariances (3-18), any coefficient, or the difference between any two, may be tested 
for significance by ‘Student’s’ t-test. This enables us to test the significance (from zero, say) 
of any 6,;, of the difference (b,,;—5,,,,) (k+%), or of the difference (6,,;—6,;) (v+4). 


4. SPECIAL CASES 
4-1. Single independent variate (q = 1) 
Denote the observed values by (y;,;,%;;) (i = 1,2, ...,p; j = 1,2,...,), and write 


: 1 
Pip = X (ij -— 24.) (Zep — Xe.) alls (5-5) ‘ 
I 


, l ' 
Qn = X (Xi5 — %.) (Yuj—Ye.)s Qik = (4-5) Qi >(t,k = 1,2,...,p), 
7 





F 1 
Rix = D(Yis— Yi.) Yas— Yu.) Rin = (tu-5) Rix | 
7 


1(k = 7) 
0(k+%) ° 

The required formulae follow immediately from the results of the previous section, 
references to which are made in the left-hand margin. 


The equations for determining the 6; (¢ = 1,2,...,), and the estimate s* of the residual 
variance are: 


1 P P ; 
(3-3) bPi-- % P= Q:i-p* Qn (t= 1,2,...,p) (4:11) 
P k=1 k=1 


where 4;, is, as before, the Kronecker delta, Ba sy 








ww ~~" 


— — “4 FH 





P p 
(3-3°) or D Pin = X Vins (4-11’) 
k=1 k=1 
Pp 
p>» (yis — ¥i;)* = 72 (Rix — 5; Qix) 
(3-4’) + j t k=1 
= YE (Rix— bib, Pi), (4:12) 
i,k=1 
(3-5) 8 = LD (Yas — Yis)?*/{(m— 1) (p— 1) — ph. (4-13) 
vj 
To test the homogeneity of the 6,, i.e. the hypothesis that 7, = £, =...= £, = B say, we 
require the estimate 6 of #, and the residual sum of squares > (yi;— Yi;)*: 
74 
pull 1 ? 1 ?P 
(36) BE (Pu-% 3 Pa)==(Qu-2 3 Qu); (4-14) 
i P k=1 i P k=1 
> Pp 
(3-6’) or 6x Pix = +d Ve (4:14’) 
i,k=1 i,k=1 


M 
M 
= 
et 
Ss 
T 


ED (Rx,—-bQix) 
(3-7) i, k=1 


ll 


Pp a 
xa (Ri, — 6°Px). (4-15) 
To test the homogeneity of the b;, we employ the identity, corresponding to (2-4) 
? Ys , ¥! 0 , 4 rr h U 
(3-12) >) (Ri, — 8°P x) = pp (Rig — 6564 Pix) + p>) (6; — 5) (6, — 6) Pix, (4-16) 


with n(p—1)—p, (n—1)(p—1)—p and p—1 degrees of freedom respectively. 
To test whether the regression is in fact significant, the appropriate identity is 


Pp Pp p 
(3-17) >) R= pp» (Rij, — 0:6, Pix) + pp b,b, Pix, (4-17) 


with (n—1)(p—1), (n—1)(p—1)—>p and p degrees of freedom. 
The two tests may be combined in analysis of variance form: 


























Source of variation —* Sum of squares aan 
p 
Due to pooled regression 1 BLD Pi 
cea 
Differences between separate j P, 
regressions and pooled regression p-l X= (6;—5) (—5) Px 3} 
i,k=1 
P 
Due to separate regressions P Xd bP 3 
Sea 
Deviations from separate : 
regressions (n—1)(p—1)—p XX (Re — 6:6, Px) 3? 
ik=1 
Total (n—1) (p—1) >>> Riz 
i,k=1 




















32 








36 The estimation and comparison of residual regressions 


To test the significance of the regression, and the homogeneity of the separate regressions, we 
test the variance ratios s3/s* and s?/s* (with degrees of freedom as indicated) respectively. 
Finally, we have: 


estimated variance of 5; = 8°9,; i 
: . (3, = 1,3,...,9), (4-18) 
estimated covariance of 6; and b, = 8°g;;, 
where the matrix (9,;,) is given by 
(3-19) (ix) = (Px): (4-19) 


4-2. The case of two samples (p = 2) with q independent variates 
Some simplification of the formulae is possible when p = 2. We note that 
Prvik = (-1)* Puss, Quik = #(—1)*** Quin Ri, = - 1)** Ry (t,k = 1,2) 


From the relevant equations of § 3 we derive the following results: 


qa 2 : 2 : 
(3-3’) x 2, 4( “ 1)i+* bux Purki = 2 $( _ 1p Qriks 
qa 2 2 3 
i.e. > ¥(-1%,P2w == = (- 1 Quix (v= 1,2,...,9;¢=1,2), (4:21) 
u=1 k=1 
qa 
(n—2q—1)8* = DT(y,j-¥,,) = 4 BT (- *( Ra z bu Quit) 
(3-4’) ij i,k=1 = 
=} BE (Ra YE bucboePoie)» (422) 
i, k= u, v=1 
° q , 2 ° 
z One E BePard) = BE (— 1 Que 
(3-6) a = ‘ i, k=1 (4:23) 
»» (— pen & bn Put > > » (— 1)* Qik (v — 2,3, 666593 t= 1, 2), 
k=1 em k=1 
(3-7) 
EE Wy y= 4 BY (-1*(Ra-Bi, Cree 3 Goi Que) 
2 
p> (— Pay Sw ulik — =e bude P, i) 
(4-24) 
2 2 
(3-9") >» >» (- 1) ** by Proki 4 >> x (- 1)* Qisg (v _ 1, 2, +99 )s (4°25) 
i, k=1 i, k=1 
2 : q 
XX (yi, — Vis)? =4 22 (- N*(Ra- & >> ba. Que) 
(3-10) 7 ois 
- $ >>>» (— 1)*( Ra =y b, by. Pact) (4:26) 
i,k=1 u, v=1 


To test the homogeneity of the 6,; (i = 1,2), ie. to test the significance of the difference 
| b,,—5,,|, we employ the identity (3-8) where now >> (y,;— Yi; 4)? and < (y,;—Y¥;j;)? are 
ij 


given by (4-24) and (4-22); the degrees of freedom being  — 2¢, n —2q—1 wt 1 respectively. 
To test the overall homogeneity of all the 6,;,6,;, ...,6; (i = 1,2), which is equivalent to 





testi 
iden 


w 


) 





A. H. Carter 37 


testing jointly the significance of the differences (b,, — 6,2) (u = 1,2, ...,q), the appropriate 
identity corresponding to (2-4) is 


2 2 ee q 
>>> (—1)4( Ry EE Yy.b, Peuit) =4 3 EE (-(Ry- EE  Dabex Pant) 
(3-12) i, k=1 u,v=1 


+ 4 2 a > > #1 ( ay 1)***(6,,; — b, ) (b.. 7 b, DF uvik» (4-27) 
with n—gq— 1, n—2q—1 and q degrees of eile respectively. 
Note. In practice, the following alternative method of evaluating and testing the partial 
regression coefficients in two correlated samples may be found useful. < 
bur, bya (u = 1, 2, ...,q) are identical with the estimates of the partial regression coefficients 


of y'(= y, — yg) on the 2g independent variates 27,,(= 2,1), Xj2( = — 2,2). The normal equ tions 
for Wg cia the coefficients in this regression are 


. 4 bare Bs (uns — Lun.) (ij — Xvi.) = DX (®oij — Zvi.) (Y5— 9") (v = 1,2,...,q; ¢ = 1,2). 
Tie ; 


2 
Substituting Tag =(—1)F ay; 9¥f= D (-1) gy 
k=1 


and similarly for the means, these reduce to (4:21). It may readily be shown that all the tests 


relating to the 6,;, bs;, ...,6,; (i = 1, 2) when considered in this manner are identical with those 
given above. 


4-3. Two samples with a single independent variate (p = 2,q = 1) 


The appropriate results follow easily from 4-1. The two equations (4-11) (i = 1,2) may be 
immediately solved for b,, b,, giving 


b, = D-Y{Pyo(Q:: — Qi2) + Pro( G22 - Mey 


(431) 
by = D'"Pyy(Qee — Qa1) + Pro(@ir - Qi2)}, 
where D=\| P — P| = P,P —- Pi 
: | . a9 7 12 (4-32) 
—P» Py» 


From (4-12), (4-13) we have 
(n— 3) 8? = Ud (Yz— Yi;)? = (Rus + Rog — 2Ryg) — (61 Qi — 51 Qi2 — bg Vor + 52 Qoe)} 
tJ 
i (Ry + Ry — 2Ry2) — (bi Pa + b3 Py. — 2b, b, P,2)}. (4-33) 
The estimate of the pooled regression is, from (4-14), 
b = {b\(Py,— Pye) + by( Pas — Pa)}/(Par + Pea — 2Pi2) 
= (Qi — Qie— Gar + Voe)/(Pir + Poe — 2P2). (4-34) 
The identity (4-16) from which is derived the test of the significance of the difference | b, — 6, | 
becomes 
4{(Ry, + Rog — 2Rjp) — 0°(P,, + Poo — 2Pya)} = 4{(Rir + Rog — 2Ryg) — (bE Pay + 03 Pg — 26,6, Pi9)} 
+ ${(b, — b,)?D/(Py, + Poe — 2P2)} (4:35) 
with degrees of freedom n— 2, n—3 and 1 respectively. 
Finally, (4-19) becomes 


1 1 9 
_{ a “a _ (22D 
(9ix) = (_ = (> D 


4-36 
IP, BPs — 








38 The estimation and comparison of residual regressions 


4-4. The case of independent samples 
If in the specification (2-1) we put a¥ =0, we have the population model appropriate to the 
case of p independent samples with g independent variates, where the residual variances are 
equal. Proceeding in a manner similar to that of §3, we easily derive the following known 


results. It is instructive to compare them with the corresponding results for related samples, 
which are referred to in the left-hand margin. 


The estimates of the partial regression coefficients are given by 
(3-3’) = bP. uvii = Qui (v => Se a = UR ieee, 4 (4-41) 


For the estimate s? of the (constant) residual variance o? we have 


{p(n — 1) — pg} s* = UE (Yis— Y,;* = © (Ru = by Quit) 
(3-4’) 


= 5(Ru— DE PP Pen) (4-42) 


u, v=1 


To test the homogeneity of the 6,,; (i = 1,2, ...,), we obtain 


bj. Fat 4 bu, ulii ~ = DO» | 
(3-6) — , (4-43) 
bi. Prt + = bu:P, uvii = Qi (v = 2,3,....q¢;0= 1,2,...,p), 
and 
q 
EE Wy Vil* = ¥(Ru-b.Que— FY Qua) 
(3-7) oe v u=2 


f qd 
= © (Rub Pass 2b; az bP, want 2s ae WiebecPeot) (4-44) 
i u, v=2 
and employ the identity (3-8), noting that /> (y,;— Yj,)* is now based on (np—1)—pq 
tj 


degrees of freedom, and the right-hand members have (np—p)—pq and p—1 degrees of 
freedom respectively. 


Finally, for the overall test of ae among samples of all the 


by, bai, ---, bai (¢ = 1,2,...,p), 
we derive the results: 
q 

(3-9") p> + by Puvit = Qi (v = 1,2, ++9Q)s (4-45) 

i u= i 

»>»> (Yi — Yi;)* _ >> (Ris >> b, Qui) 
(3-10) Le t u= 

= (Ru- EE by.be,Peoa) (4-46) 
i u, v=1 


The appropriate identity corresponding to (2-4) is then 
% (Ru i - zr b, by. Pest) = E (Ri i EE bab vi Pat 
(3-12) + u,v=1 u,v=1 
+> > ~ (6.:— by.) (Oy; — by.) Pavis (4-47) 


i u, v=1 


with (np —p)—q, (np—p)— pq and q(p— 1) degrees of freedom respectively. 














the 


wi 


a ee ee ee, ) | 











A. H. CarTER 39 


In the simple case g = 1, the above results reduce to the well-known formulae 
bP = Qi (0 = 1,2,.-.,p), 
(n—2)8* = YE (yijs— Fis)? = U (Ru — OFF), 
a + 


b ~ Py = - Qi: 
EE (¥g— Vig? = J (Re PP), 
the appropriate identity from which is derived the test of the homogeneity of the b; being 
~ (R,,—B*P,;) = ~ (Ry, — b3Py) + ~ (b;—6)*P,,, 


with p(n —1)—1, p(n—1)—p and p—1 degrees of freedom. 
From the foregoing, it is seen that the results for independent samples are identical with 
those for non-independent samples provided we write 


) en e bxP, uvik> Quix = Six Quik Rix oa bx Rix, 


which are equivalent to Pi, = Qui, = Ry, = 0 (k +1), in the former case. Conversely, given 
the results for p independent samples, we may readily derive those appropriate to p related 


Pp 
samples by replacing P,,,;; by >} Puviz, ete. 
k=1 


5. Discussion 


Provided the specification (2-1) correctly describes the population from which the observed 
values are drawn, the estimates derived above are ‘ best’ estimates in the maximum likelihood 
sense, and the resulting tests are exact tests. It is important to bear in mind the underlying 
assumptions implied in the specification, the validity of which necessarily conditions the 
applicability of the results. 

The assumption of normality of residual variation is one frequently accepted, and one 
which has a satisfactory empirical basis in many biological fields. 

The condition that the correlation between samples is due to an additive effect common to 
corresponding members would appear to be a reasonable one in the type of cases considered 
in §1. If there were grounds for believing the samples to be related in some other manner, 
a suitable transformation might be applied, to ensure that the condition held. 

There is the further requirement that the effects of the two factors of classification— 
samples on the one hand, individual sample members on the other—be independent, i.e. that 
there is no interaction. In the sugar beet experiment quoted in § 1, for example, this requires 
that the several localities do not exert differential effects on the different varieties, i.e. that 
locality-variety interaction is negligible. Whether the assumption is justifiable or not will 
depend on the circumstances of any particular problem. Further consideration would be 
required for the more general case where interaction exists. 

Implicit throughout this paper has been the supposition that the independent variates 
(the x’s) were not subject to error. In deriving the tests of homogeneity and significance, we 
have in effect considered the sampling distribution of a test criterion (a variance-ratio or 








40 The estimation and comparison of residual regressions 


‘Student’s’ ¢) for repeated sampling from a population in which the z’s were held fixed. The 
sampling distribution of this criterion, however, is clearly seen to be independent of the z’s. 
Provided the relation (2-1) holds, therefore, the tests will be valid irrespective of the distribu- 
tion of the z’s. 

An assumption which might appear more difficult to justify in practice is that concerning 
the equality of the residual variance for each sample. Since this variance is estimated for the 
totality of samples but not for each separately, no standard test can be made of the homo- 
geneity of the separate residual variances. The assumption of equal residual variance is, of 
course, the same as that made in testing the homogeneity of the means of correlated samples 
in an analysis of variance. 

A case which requires special consideration is that where one or more of the independent 
variates are the same for all samples. If the formulae given in the preceding sections were 
applied in this case, it would be found that the equations for estimating the regression coeffi- 
cients were indeterminate, and we conclude that the specification of the population adopted 
is inappropriate. To satisfy the criterion of equal residual variances, in fact, the correct 
specification would require that the regression coefficients on a variate which is the same for 
all samples are themselves the same. On logical grounds, it seems reasonable to suppose that 
the effect on the dependent variate of such a common independent variate, as measured by 
the relevant regression coefficient, is the same in all samples. In the general case (§3), if 
ij = X,_;(¢ = 1, 2,...,p), for example, the appropriate specification of the population would 
require that £,; = £,, say (¢ = 1,2,...,p). Proceeding as before, tests of the homogeneity of 
the regression coefficients on those of the independent variates which did vary from sample to 
sample could be derived. 

As mentioned in the Introduction, the problem of comparing the regression coefficients 
from two correlated samples has been considered by Yates. It is of interest to compare the 
method proposed by him with that developed in this paper: in the first of the examples which 
follow, where both methods have been applied, it will be observed that very different results 
are obtained. 

In the first place, it is to be noted that the actual estimates of the regression coefficients to 
be compared are themselves different. In the treatment adopted in this paper, the regression 
coefficients as estimated may be regarded as measuring the net effects of the independent 
variates, allowance being made for the correlation between samples. In Yates’s method, the 
estimates, computed for each sample separately, measure rather what might be termed 
‘crude’ regression effects, no account being taken of the correlation. To assess the validity of 
these estimates and of the derived tests, we must examine the underlying assumptions 
(concerning the parent population) which have been made. 

With Yates’s approach, the two samples, comprising, in the general case of g independent 
variates, the observed values (y;;,2,,;;)(u = 1,2,...,q; i = 1,2; 7 = 1,2,...,) are supposed 
drawn from two populations specified by 


q 
Yi = DX Butugt Vite +e; (= 1,2; 7 =1,2,...,n), (5-1) 


u=1 


where f,,;, yf, “* are as in (2-1), the €;; are random residuals normally distributed about zero 
means with (different) variances o;? (i = 1,2), and ¢};, €,; are correlated, their covariance 
being x. 








A. H. Carter 41 


The estimates b,,, of £,,, are taken to be the solutions of the normal equations (appropriate to 
independent samples) 


Lrvigl(Yis — Ys.) - > bui(Zuig — Tux.)} =0 (v=1,2,...,¢; 4 = 1,2), (5-2) 
j u= 
or (compare (4-41)) > bai Puvit = Qvii- 
u=1 


The difference | b,,, —5,2| (u = 1,2, ...,q) is then normally distributed. Its standard error is 
not of course known, but must be estimated from the data. Were this estimate a simple one 
based on a known number of degrees of freedom, an exact t-test could be applied to test the 
significance of the difference. In fact, the estimate is a composite one, with component parts 
based on different degrees of freedom. Only if the samples were sufficiently large could the 
approximate normal test justifiably replace the exact t-test, and the significance of the 
difference | b,,, — 6, | be tested. 

The precise interpretation of the estimates 6,; given by (5-2) is, however, open to doubt. 
Were the samples independent they would be the maximum likelihood estimates of the 
partial regression coefficients £,,; they are not maximum likelihood estimates under the 
specification (5-1). The maximum likelihood equations for this case are in fact complicated, 
not capable of algebraic solution even in the simplest case, g = 1. 


Comparing now the specification (5-1) with (2-1), we see that, if they are to be equivalent, 


where €;;, €,; have different variances o;*, 7;?, and are correlated, while €1;, €,; have the same 
variance o*, and are independent. The population model adopted in § 2 is thus less general 
than that of Yates: on the one hand the correlation between the residuals is assumed due to 
a common additive factor; and on the other, the residual variances are assumed equal. The 
first restriction has already been discussed; as regards the second, further consideration 
would be required if the residual variances were presumed unequal. It may be remarked that, 
for the case of two samples only, the maximum likelihood estimates of the regression coeffi- 
cients when the specification (2-1) is extended to allow for different residual variances are in 
fact the same as those given in § 4-2. 

In conclusion we note that, provided the assumptions made concerning the parent popula- 
tion (§ 2) are valid, the method of estimating and comparing the regression coefficients from 
correlated samples presented in this paper has the following advantages: (1) the estimates are 
best estimates from the point of view of maximum likelihood, (2) the resulting tests are 
exact tests, and (3) the results are applicable to any number of samples. 

It is hoped to deal in a further paper with (1) the application of the theory in covariance 
analysis, (2) the case where interaction is assumed to exist, (3) the assumption of unequal 
residual variances, (4) the extension of the theory to a three-factor classification, (5) the case 


where one or more of the independent variates are the same for all samples, and (6) curvilinear 
regression. 


7. NUMERICAL ILLUSTRATIONS 


Example 1. The following data, extracted from the Rothamsted Experimental Station 
Annual Reports, relate to the Broadbalk wheat plots: 











42 The estimation and comparison of residual regressions 


y denotes yield of grain (in bushels per acre). 

x denotes amount of straw (in cwt. per acre). 

I represents treatment 2, receiving farmyard manure (treatment 2A after 1922). 

II represents treatment 13, receiving nitrogen, potash and phosphates (artificial fertilizer). 





























I II 
Yeart 
YW ae Ye X 

1908 38-6 32-2 36-0 29-6 
1910 27-9 38°3 25-3 34-0 
1912 (1) 16-9 17°6 6-1 9-5 
1914 (1) 30-7 36-6 19-2 21-6 
1916 (1) 33°3 41-3 25-1 35-8 
1918 (2) 30-8 38-8 20-3 27-2 
1920 (2) 28-3 38-4 24-9 29-6 
1922 (2) 32-9 31-8 24-4 26-9 
1924 (2) 10-3 18-6 15-0 21-2 
1926 (1) 6-8 24-6 9-3 26-4 
1928 (2) 41-1 51-3 55-2 56-2 
1930 (3) 26-1 60-0 29-2 58-1 
1932 (3) 10-1 42-6 11-0 46-3 
1934 (3) 23-3 63-8 28-6 60-8 
1936 (3) 7:3 34-8 9-4 26-4 
1938 (4) 38-2 41-0 42-5 31-9 





t+ Subsequent to 1910, the main treatment plots were subdivided. The numbers in brackets indicate 
the plots to which the figures given relate: (1) lower portion of field, (2) upper portion of field, (3) subplot 
V, (4) subplot VI. It is recognized that since the figures given under I and II do not in fact all relate 


to the same plots, the methods applied may not be strictly appropriate ; though since the treatments are 
the same, the data are of value for the purpose of illustration. 


Does the regression of yield of grain on amount of straw differ significantly for the two 
manurial treatments? Since the corresponding pairs of observations in each sample relate to 
the same year, the samples cannot be regarded as independent. It seems reasonable to 
suppose, however, that the relation between the two samples is due to a common additive 
‘year’ factor, i.e. the a;, and we apply the results of §4-3. From the given data we find 


P,, = 2,423-97, Qi, = 814-25, R,, = 1,987-06, 
Py, = 2,527-68, Qyo = 1,363-84, Ry, = 1,875°19, 
Py, = 3,119-18, Qe, = 561-61, Ry, = 2,547-76, 
D-1 = 0-8535 x 10-®, Qs. = 1,545°59. 
From (4:31), we obtain b, = 0-660, 6, = 0-850. 
From (4:33), 8? = 11-945 (13 degrees of freedom). 
‘ 0-0053 0-0043 
ee (Ju) = Come aie , 
Hence finally ,t ty, = 2°62", ty) = 3-82**, ty 4) = 1-91. 


We conclude that b, and 6, are significantly different from zero (5 and 1 % levels respectively), 
while the difference (b, — 6,) approaches significance (5% point = 2-16). 


} We shall use the markings *, ** and ***, respectively, to denote significance at the 5, 1 and 0-1% 
levels. 





’ 


’ 





A. H. Carrer 43 


Yates’s method. To avoid confusion, the regression estimates will be denoted by 6; and 63. 
In the present notation, for samples of size n we have (writing var for ‘estimate of variance’) 


6, = Qu/Py, var (63) = (Ay — 5, Qu)/(m— 2) Pu, te» = b;/./{var (64)} 


(n — 2 degrees of freedom), 
and similarly for 53. 


The estimate of the variance of the difference (6; — 6}) is given by 


2Pis 


ver i -0) = eG tO) o- 5 P,, Prt P2, ei Qi2— 52 Qa + 0162 F,,2). 





For the example quoted, therefore, we derive the results 


bi = 0-336, bi) = 0-0505, = 1-49 a 
: sale fw) } 14 degrees of freedom (5% point is 2-145) 
b, = 0-496, var(bj) = 0-0408, ty, = 2-45* 


(b, bi) = 0-1596, var (b,—b;) = 0-0159. 


The estimate of the standard error of (6, — 6;) is therefore 0-1261. On the assumption that 
(6, — 6;) is normally distributed with true variance 0-0159 the difference is clearly far from 


significant. To sum up, 6; is significant at the 5% level, but neither 6; nor the difference 
(6; — 6;) is significant. 


Example 2. In an experiment to investigate wool growth, measurements of the areas of 
certain tattooed squares (initially the same size), and of the weight of wool produced from 
these squares, were obtained for a number of sheep, in successive seasons. In the following 


table are the figures relating to four such squares, on different parts of the same sheep, for the 
summer season. 


y denotes the ‘clean’ weight of the wool cut from the tattooed square, x denotes the area of 
the square. I, IT, I1I, IV denote fore, back, hip, and britch regions respectively. 























I Il Ill IV 
Sheep 
no. 

Y ca | Ye ca} Ys Xs Ya X% 

l 82 71 117 98 78 70 62 72 

2 84 72 142 99 89 74 67 70 

3 160 107 198 lll 148 102 96 105 
4 96 62 150 74 98 71 65 62 

5 88 82 154 117 98 85 62 73 

6 68 48 79 61 70 57 60 54 

7 113 70 157 96 114 67 85 66 

s 136 102 190 116 136 97 65 79 
9 86 51 118 68 93 60 54 63 
10 144 92 161 106 140 116 97 86 
ll 64 65 80 * 62 75 60 42 68 





























Is the regression of weight of wool per square on area of square the same for all regions? The 
four samples are clearly correlated, since each sheep contributes one pair of observations to 
each sample. We proceed to apply the results given in § 4-1, when p = 4, n = 11. 











The estimation and comparison of residual regressions 


























































































































Ru Qi 
k be 
2 1 2 3 4 Total i\ 1 2 3 4 Total 
4 
1 | 10,217 | 11,008 | 8,606 | 4,652 | 34,483 1 | 5,229 | 6,309 | 4,490 | 2,061 | 18,089 
2 | 11,008 | 15,225 | 9,561 | 4,746 | 40,540 2 | 4,640 | 6,931 | 4,031 | 2,162 | 17,764 
3 8,606 | 9,561 | 7,445 | 3,758 | 29,370 3 | 5,375 | 5,913 | 4,624 | 2,448 | 18,360 
4 4,652 | 4,746 | 3,758 | 2,937 | 16,093 4 | 3,665 | 3,899 | 3,034 | 1,603 | 12,201 
Pi, Pu 
NN l 2 3 4 Total \& 1 | 2 3 4 
v t \ 
1 | 3,714 | 3,550 | 3,345 | 2,400 | 13,009 l 2785-5 | —887-5 | —836-3 |— 600-0 
2 | 3,550 | 4,619 | 3,248 | 2,054 | 13,471 2 | —887-5| 3464-3 | —812-0 | —513-5 
3 | 3,345 | 3,248 | 3,769 | 2,219 | 12 3 | —836-3 | —812-0| 2826-7 | —554-7 
| re 
4 | 2,400 | 2,054 | 2,219 | 1,893 | 8,566 4 | —600-0 | —513-5 | —554-7 |] 1419-7 
al 
4 , 4 4 
> Qin = Qi-} p> Qi hence > Qi = §229— }(18,089) = 706-7, 
k=1 k=1 k=1 


4 
Y Qi, = 6931—4(17,764) = 2490-0 
k=1 


> 


4 
ED Qo, = 4624—4(18,360)= 34-0, 
k=1 


4 
YE Qix = 1603 — (12,201) = — 1447-3 
k=1 


<S Pix = DP —-4 DE Pi, = 13,995 — }(47,627) = 2088-25, 
i i tk 

= Qin = ae D> Qn = 
= Rix = DR ~_ 2d Ri 


18,387 — }(66,414) = 1783-5, 


35,824 — 1(120,486) = 5702-5. 


The equations to solve for the b; (4-11’) are therefore 


b, 2785-5—b, 887-5—b, 836-3—b, 600-0 
—b, 887-546, 3464-3—b, 812-0—b, 513-5 
—b, 836-3—6, 812-0+6,2826-7—b, 554-7 


706-7, 
2490-0, 


34-0, 


—b, 600-0—6, 513-5—b, 554:74+6, 1419-7 = — 1447-3. 


> 








Th 


(see | 


and 




















A. H. CarTER 45 
These equations may best be solved by using the inverse matrix of coefficients 
(Gix) = (Pix) 
(see (4-19)) which will itself be required later, 


(g,) = 10-* (774 408 476 661 
408 562 385 526 
476 385 728 625 
661 526 625 1419 


We then obtain 6, = 0-6234, 6, = 0-9387, 6, = 0-4164, 6b, = —0-2534. 
4 
Now => 4.0 = p> | = (0-6234) (706-7) +... = 3158-9, 
i,k=1 i ok 


hence the sum of squares of deviations from the separate regressions, i.e. )>(y;;— ¥;;)*, is, 
ij 


from (4-12), 
TE Ry —- TDS; Qi, = 5702-5 — 3158-9 = 2543-6. 
ik i, k 


From (4-13) therefore, s? = 2543-6/26 = 97-83. 
For the pooled regression, (4-14’) gives 
b= >> 0: /=x P%,, = 1783-5/2088-25 = 0-8541, 
ik ik 
and by (4-15), the sum of squares of deviations from the pooled regression, > (y;;— Yi,)*, is 
ij 
ED Ri, -5 DD Qi, = 5702-5 — (0-8541) (1783-5) = 4179-3. 
i,k ik 


The sum of squares due to differences between the separate regressions and the pooled 
regression is therefore (4179-3 — 2543-6) = 1635-7. The appropriate analysis of variance is: 

















ee Degrees of Sum of Mean 

Source of variation Pe squares square 
Due to pooled regression 1 1523-2 — 
Differences between separate regressions 3 1635-7 545-2 = s? 

and pooled regression 

Due to separate regressions 4 3158-9 789-7 = 33 
Deviations from separate regressions 26 2543-6 97-83 = s* 
Total 30 5702-5 — 




















To test the significance of the general regression, the variance-ratio is 789-7/97-83 = 8-1*** 
(4 and 26 degrees of freedom), significant at 0-1°%. For the homogeneity of the separate 
regression coefficients, the appropriate variance-ratio is 545-2/97-83 = 5-6** (3 and 26 degrees 
of freedom), which is significant at the 1 % level. We conclude therefore that the regression is 
on the whole very significant, but there is strong evidence of heterogeneity as between 
samples. We may wish to enquire where this heterogeneity lies. 








46 The estimation and comparison of residual regressions 
For the significance of the separate coefficients, we have 
ty, = 5;/{s./g,;} where s = ,/97-83 = 9-891 (26 degrees of freedom). 
Hence ty, = 0°624/{9-891 (774 x 10-*)} = 2-26. 
Similarly to) = 4:0**, ty = 1°55, ty) = 0°68. 
To test the significance of the difference between any two coefficients, |6;—b,| for 
example, we have ¢,,, 5, distributed as ‘Student’s’ ¢ (on the hypothesis £; = £,,), where 


tb: -oy = |b: — bx | {8 (Gee + Gu — 29ix)}- 
Examples are: 


tes—b,) = 0-6698/{9-891 x 10-3,/(728 + 1419-2~x 625)} = 2-26*, 
t,-bs) = 0-89, t,-b.) = 1-40, t.—b;) = 2-31*, 


To summarize, the regression of weight of wool per square on the area of the square is 
adjudged significant for the fore and back regions, but not for the hip and britch regions. As 
regards the separate regression coefficients, that for the britch region is significantly less than 
for the other three regions; that for the back region is significantly greater than for the hip 


region; the differences between the fore region on one hand, and the back and hip regions on 
the other, are not significant. 


I wish to record my thanks to Dr J. Wishart for assistance and.advice in compiling this 
paper. I am indebted also to Dr H. E. Daniels for initially drawing my attention to the 
problem dealt with; to the Wool Industries Research Association for placing at my disposal 


the data of Example 2; and to Professor E. 8. Pearson for constructive advice on the presenta- 
tion of the paper. 


REFERENCES 


BarRTLeETT, M. S. (1933). On the theory of statistical regression. Proc. Roy. Soc. Edinb. 53, 260. 


BaRTLETT, M. 8S. (1934). The problem in statistics of testing several variances. Proc. Camb. Phil. Soc. 
30, 164. 


Garin, N. (1947). A study of wool growth. I. J. Agric. Sci. 37, 275. 


KOLopzIEJczyk, St. (1935). On an important class of statistical hypotheses. Biometrika, 27, 161. 


WE cH, B. L. (1935). Some problems in the analysis of regression among k samples of two variables. 
Biometrika, 27, 145. 


Yates, F. (1933). The principles of orthogonality and confounding in replicated experiments. J. Agric. 
Sci. 23, 108. 


Yates, F. (1939). Tests of significance of the differences between regression coefficients derived from 
two sets of correlated variates. Proc. Roy. Soc. Edinb. 59, 184. 








CU 


bei 


wi 


if 


ln 


= 


i a 


es 
ee = 





[ 47 ] 


CUMULANTS OF MULTIVARIATE MULTINOMIAL DISTRIBUTIONS 
By JOHN WISHART, Statistical Laboratory, University of Cambridge 


1. For the ordinary (Bernoulli) multinomial distribution in one variable a simple cumulant 
recurrence relation is due to Guldberg (1935) and is deduced as follows: 

Let an event, for which the chance of failure is p,, happen in any one of n ways, with proba- 
bilities p,, po, ...,P,- Then the chance that in a random sample of s trials there will be x, 
successes of the first kind, x, of the second, ..., x, of the last, will be 

s! 


Tz, Zz. on 
yp 1 Po’ Pi’ --- Pn®s 
Sql H_! ... By! 


being the general term in the multinomial expansion of 


(Pot Pit---+Pn)’, 
n n 
where Py = 1- 2X Pi Ly = 8— 2% (@)- 
t= = 


The probability generating function (p.g.f.) is 


n Ss n s 
(0+ > pia) : ay? (1+ > 2,24) (1-1) 
i=1 


i=1 


n 
if we put p,/p) = a, (t = 1,2,...,), so that p; = a,/ay, where ag = 1+ > a;. 
i=1 
To obtain the cumulant generating function (c.g.f.) we put a; = e% in (1-1) and take the 
natural logarithm. In the usual notation we then have 


K = s{—In(1+ Xa,)+In(1 + La,e%)}. 


It follows that = i oe ua 
ct; l + La,eli 








=K + sa . ) 
wee ete i 1 + da, et 1+ a, 
(1-2) 


since x_,_, the first-order cumulant (or mean), is given by 


0K sa; 
«= (3e),_, "Teka (1-3) 


The unit in the subscript for « is regarded as being in the ith place. 

We now differentiate (1-2) r; times with respect to ¢; for alli = 1, 2, ....». Then on changing 
the order of the differentiation on the right-hand side and putting ¢; = 0, for ¢ = 1, 2,...,, 
we have 


Krirg...ritl...tn 


0 
= 06 Bg, (Krara-sotis-srn)s (1-4) 











48 Cumulants of multivariate multinomial distributions 


where r;>0 with at least one non-zero r. This is Guldberg’s result, of which all simpler 
results are special cases. Thus for the ordinary binomial distribution we have 


dk, 
Kur = a5? (r>1), 


where a = p/(l1—p) = p/qg. Alternatively, we may write 
Kri1 = PY we ’ | 
dp 
the well-known result due to Frisch (1925), and rediscovered by Haldane (1940). 
The cumulants are easily worked out in the multinomial case. We start with (1-3), namely, 


K..1,, = 80;/@) = sp;, 


\ 
Op; q; : 
and use the relations a, =P = — a,x = 9p;9; Where q;=1-—p, | 
v t | 
Pi 0g; 


“42a, ba ~ "ia, =—p;P; (t+). 


With the minimum of algebraic manipulation we then obtain the cumulants to any desired 
order. To the fourth order they are: 











K1.. = 8Pis K.4.. = 8p,9(1— 6p;,9;), ) 

K..2.. = SPidi K..31. = ~S8p;,p;(1 — 6p;q;), 

Kou. = —SPiP; Ko. = — 8D; P{(9i— Px) (9; — D3) + 2p; D3}; | (1-5) 
K..3.. = 8PiGili— Pi) K. a1. = 28); Pj Px(9; — 294); 

Ka. = —8P; PI — Pi), Kyi, = — 98p; Pj P_Pr- 

Kau. = 28P:PjPps } 





In the above the order of the non-zero subscripts in any « is that of the i, j, k, / introduced 
on the right-hand side. These results comprise all special cases. Mention should be made of 
papers by Qvale (1932) and Gotaas (1936), who dealt with the conditions to be satisfied by 
a general discontinuous frequency distribution for a simple cumulant recurrence relationship 


of the pattern of (1-4) to hold. | 


2. The negative binomial, i.e. the binomial distribution with negative index, is well known. 
Its multinomial generalization (for one variable) has been called the Pascal multinomial 
distribution, and this name will be used to distinguish it from the Bernoulli distribution. We 
are here concerned with the joint probability that an event A; (with constant probability p;) 


shall occur z; times, for i = 1, 2,...,n, and that the event not (A, A,... A,), with probability | 


n 
Po; Shall occur s times (including the result of the last trial) out of s+ >} x; repeated trials. 
i=1 


This probability is given by 
(s—1+ Za)! 
(s—1)!a,!...2,! 


being the general term in the multinomial expansion of 





8 yt x 
PoPi' +--+ Pr’ 


Pl —p,—Pg—...—P,), 


where py = 1 — Xp;. The p.g-f. is a(t - ia) (2-1) 
i=1 








whil 


It fc 


an 











JOHN WISHART 49 


while the c.g.f. may be written 
K = s{In(1—p,) —In(1—=p;,e%)}. 











OK sp,eli 

It follows that “=” 1— Zp, ei 

ec ( eli 1 

ER. te: + Sp; (1 —Yp,e 1 — Xp; 

OK 
- K.1..+ Pia? aa 
OK SPi 

since Ka. = ( x). ™ ISP, 


As before, differentiate (2-2) r; times with respect to ¢; for all i = 1,2, ...,». Changing the 
order of differentiation and putting ¢; = 0, for i = 1,2, ...,n, we have 


0 
Krirg...titl...tn Pi dp, (Krara..-ti---rn> (2-4) 
i 


where r;>0 with at least one non-zero r. This result was also deduced by Guldberg (1935). 
A special case is that for the negative binomial, and is 

. dx, +. 

Kr = Pip (721), 
as used, with a slight change of notation, by the author in a former paper (Wishart, 1947). It 
should be noted that the simplest form of the Bernoulli relation is in terms of a;, as in (1-4), 
whereas p,; takes the place of a; in the simplest form (2-4) of the Pascal relation. 

On the other hand, the cumulants of the Bernoulli distribution are most simply expressed 

in terms of p; and q;, as in (1-5), whereas those of the Pascal distribution can be written most 
simply in terms of a; and 6; = 1+a;. Thus we start with (2-3), namely, 


K_1., = SP;/Po = 80;, 
and use the relations 


cy pa) > 
0a; 0b; Ca; 


pa Maes, Man Moane, Gap 
Pity, Pop, i%> Pi ap, 5 dp, % 5 J). 


To the fourth order the cumulants are: 


K..1., = 80j; K..4., = $a,b,(1 + 6a,6;), ; 

K..2., = 8a,b,, K.31, = $a,a,(1 + 6a,6,), 

Ku. = 80,45, K..92, = 8a,a,{(b; + a;) (6; +a;) + 2a,a;}, (2-5) 
K_.3,, = $a,b(b;+4,), Kou, = 28a,a;a,(6;+ 2a), 

Ko, = 84,4,(b;+4,), Kin. = 68a,0;4,%. 

Kin, = 280;4;,, 





4 
These results comprise all special cases. The formulae in (2-5) are similar in form to those in 
(1-5), but with the signs all positive, and we may perhaps repeat that 
PilPo = 4%, sothat p;=a,/dp, 
where Po =1-Xp,, ay = 14+2Za;. 
3. We shall now consider the corresponding multivariate distributions of both the Bernoulli 


and Pascal kinds, with a view tc deriving the appropriate cumulant recurrence relations. 
Biometrika 36 4 











50 Cumulants of multivariaie multinomial distributions 


From these we shall work out all the different cumulants, up to the fourth order. The work is 
straightforward, the chief difficulty being to devise a satisfactory notation in order to identify 
and condense the formulae. The methods are sufficiently illustrated by taking the simplest 
case of the bivariate binomial, first for the Bernoulli case and then for Pascal. 

The two variates are the numbers of successes in two events. Let the respective probabilities 
be as in the following table: 








2nd event 
Success i Failure Total 
Ist event Success Pu Pro p 
Failure Por Poo q 
Total p’ q” | 1 


We have g = 1—p, q’ = 1—p’ and poy = 1 — Pig — Por —Pur- We note also that 
PooPu— PrP = Pu- PP’: 


The probability of 2, successes in the first event (and s— x, failures) and of x, successes in 
the second event (and s — x, failures) is given by the following function of the two variables 


x and Xo: min. 
Pe ie 8s ot ta-o-ni 0 
f(#1,%2) = * >} (" a Puri Poi “Pooo* ~*° \’ (3-1) 
1/ i= 2 
: . c cl 
in which (;) = d\(c—d)! ° 


It may be deduced from (3-1) (or more simply from the p.g.f. below—see §4) that the 
marginal distributions are represented by the binomials (q+ p)* and (q’ + p’)’ respectively. 
Also when p,, = pp’, so that p15 = pq’, Por = P'Y: Poo = 99’; (3°1) becomes the product of two 
binomial forms, so that the distributions are independent. 

The p.g.f. is 

(Poo + Pro% + Por B+ Pir %B)* = Ago + 49% + Ay, +44, af), (3-2) 
if we put Py9/Poo = 10> Por! Poo = Gor» Pir! Poo = Cir» 80 that pog' = 1 + Ayq+ Gq, +441 = Ao, SAY. 

Putting « = e', # = e" we have for the c.g_f. 


K = s{—Indgg + In(1 + a,ge! + ag, e" +.4,,€")}. 


OK  se(a,,+a,,e" 
Then : ihn ten) where D = 1+, ¢e' +a ,e"+a,,e" 
= Kit 8 [Se +6") Ayo +43) , 
D Ao | 
tae oK pu ot 
and similarly = = Ky + 8 * Qn +4n€) 4+) : 
ou D Ao | 
OK } q ‘ 
where Kip = (S) = 8(449 +41) > Mes = (=) on 8(@ + 41) , (3-3) 
Ct /teu=o0 49 OU}, u=0 29 
are the means of the variables x, and 24. 
It follows that 
OK ae oK din oK OK Ma eK ‘ OK (3-4) 
on en. on a aoe oo , 
ot ti "tin! “ee "Oly = le 


























Diff 
of d 


The 
nar 
con 


als 


Le 


In 


ar 


es 


4) 








JOHN WISHART 51 


Differentiating r, times with respect to ¢ and r, times with respect to u, and changing the order 
of differentiation on the right-hand side, we have, on putting ¢ = u = 0, 


K. = [a.,=—— “+ Qi1 =—— 1 K. 
mt1 T. ( 10 a9 ll =) T,-T2? ( 
3-5) 


0 
K, = $6, —— +4, = 3K, 5: 
T).Totl ( 01 aS T,-T 
\ Ag, CAy, 


These are the required cumulant recurrence relations. To use them we start with (3-3), 
namely, Ky) = 8p, Ko, = sp’, and can then readily verify that if A and A’ be written for the 
complete operators on the right-hand side of (3-5), we have 


A(p) = —A(Qq) = pq, A(p')=—-AY') = P'd, 

A(p') = A'(p) = — A(7’) = — AQ) = Pu— PP’: 
also A(Py.— pp’) = (¢—-P)(Pu-—Pp'), A'(Pu—Ppp') = (' —P')(Pu— PP’) 
Let us agree to write p,, for p,,—pp’. Then, to the fourth order, 





Kio = SP, Koi = Sp’, 
Koo = 8P9, Ky = SPay, Koz = SP’ 
Ko = 8pq(q— P); Kos = $p'q'(q' — P’), 
Kay = 8Pay(I— P), Kis = SPan(q'—p’), ¢ (34) 
Kyo = spq(1 — 6pq), Kos = 8p'q'(1—6p'q’), 
Ky, = 8Pqy(1 — 6pq), Ky3 = $Pay(1 — 6p’g’), 
Kee = SPan{(Y—P) (q' — Pp’) — 2Pay}- J 


In the special case of p,, = 0 we see from (3-2) that the case is that of a univariate trinomial, 
and, in fact, formulae (3-6) become equivalent to (1-5). 


4. There is a corresponding bivariate negative binomial, or Pascal, distribution, for which 
the p.g.f. is ' 
Pio(l — Pyo% — PrP — Pur Xh), (4-1) 


where po9 = 1 — P49 — Po, — P11. TO show that this is the correct form, write it as 
sf \-s \ s ‘8 +%,-1 —s-2 Liyt 
Pool(l — Por) — (Pio + Pur h) a} = a, Poo x, (1 — 99,8)" "(Po + Pu B22"). 


It follows from this that the marginal distribution of x,, obtained by summing the general 
term of the distribution whose p.g_f. is (4:1) from x, = 0 to 00, has the p.g.f. 


= ~ 
>> {pa(* se ws ) (1 + Por Bb) "(Pro + Pur py 


xm=0 
= Poo{(1 — p98) — (Pro +Pyh)}° 
= pio{(! — Pro) — (Por + Pir) BY, 


and therefore represents a negative binomial or Pascal distribution, 








52 Cumulants of multivariate multinomial distributions 
Likewise, the marginal distribution of x, has the p.g.f. 
Pool (1 — Por) — (Pro + Pur) @}-*- 


Now beginning with (4-1) and writing « = e!, 8 = e", it is easy to show by the method of 
$3, but without this time transforming to a9, ao,, etc., that 


er ea Ft! 2) ; -( ts =) (4-2) 
Ky tiers = Pap, Pudi Ty -T2? Kp orgtl — Po dp, Pur, T-T?? 


with Kyo = 8(Pyo— Pur) + Poo = 8(410 +43), Kor = (Por — Pur) + Poo = 8(401 + 211)- 


Let us write a for a,,+4a,, and b for 1+a; likewise a’ for a),+a,, and 6’ for 1+a’. Then 
using P and P’ for the complete operators in (4-2) we readily verify that 


P(a) = P(b) =ab, P'(a’) = P'(b’) =a'd’" 
P(a’) = P(b’) = P'(a) = P'(b) = a,, +40’, 
P(a,,+aa’) = (b+a)(a,,+a0’), P’(a,,+aa’) = (b' +a’) (a,, +0’). 
Let us write a, for a,,+aa’. Then, to the fourth order, 


’ 
Kyo = 8a, Ko, = 8a’, 


Key = 806, Ky, = 80jy), Kog = 806’, 


K39 = sab(b+a), Ko3 = 8a’b'(b' +a’), 

Ke = 84qy(b+a), Kyg = 844) (b' +a’), 5 (4:3) 
Kyo = sab(1 + 6ab), Ko, = 8a’b'(1+ 6a’b’), 

Kg, = 844,(1 + 6ab), Ky3 = 8dqy(1 + 6a’b’), 





Koq = 8Aqy{(b +a) (b’ +a’) + 2aqy}. 


5. The reciprocal character of the Bernoulli and Pascal results, in the notation which has 
been adopted, should by now be sufficiently obvious for it to be necessary only to consider 
one of these cases in its most general multivariate multinomial form. We shall choose the 
more familiar Bernoulli case. The general distribution will be one in m variables, such that the 
m marginal distributions are multinomials, not necessarily of the same order. A typical one 
will be described as a (n,; + 1)-nomial, denoting that the marginal distribution for x; (where 
i = 1,2,...,m) is a multinomial of the (n;+ 1)th order. Thus it will be derived by considering 
the chance of success of an event in any one of n; ways, with probabilities p?, p®, ..., p®, 
failure being denoted by p{?. The superscript (i) denotes the practice already exemplified in 
§§ 3 and 4, i.e. p will refer to the first variate, p’ to the second, p” to the third and so on. The 
letter j will denote the general member of the subscripts 1, 2, ..., n,;. 

These p’s which have been defined are marginal probabilities. The primary probabilities 
will be denoted by the letter » with m suffices, taken in order for the variates (or events) 
1,2,...,m, each suffix being one of the numbers 0, 1, 2,...,;. Failure in all events will be 
denoted by 9,9. There will be a unitary series, of which a typical set is py 4.9) Po..2..0>- 


m 
Po..n;..o» altogether > (n,) probabilities. The binary series will have two non-zero suffices, 
i=1 


the ternary series three, and so on. If ¥(p) denotes the sum of all p’s with a non-zero suffix 
anywhere, Po, will be 1—X(z). We may, as before, put p __/Po9..9 =@...., where p_. 














Ki 


re 


a 2 se = +S 3 LD 


3) 








JOHN WISHART 53 


denotes any pexcept Po9_»,anditfollowsthatp  =a__ /dg9 9,Wheredg, » = 1+ X(a), this 
last summation denoting the sum of all a’s with a non-zero suffix anywhere. 

A proof on the same lines as in the preceding particular case dealt with in $3, details of 
which it is hardly necessary to give, yields a set of cumulant recurrence relations which may be 
expressed symbolically by 

Kigen = AP Kay. (5-1) 


There are X(n,) separate formulae in this expression, arranged in m sets. A‘? is an operator of 


the form (2 =) , i.e. the sum of all terms of the form a = with suffices to the @’s such that 


j occurs in the ith suffix, the number of terms being such as to permit of all possible combina- 
tions of the numbers 0, 1, 2, etc., in the remaining suffices. Thus, to illustrate from a (4 x 3 x 2) 
table, i.e. a three-variate case in which the first variate is 4-nomial, the second 3-nomial and 
the third binomial, we shall have: 


Ay = Ay 9 0/ OG, 99 + 4119 O/ 08419 + 299 O/ OAy99 + F191 O/ Oyo + 2431 0/0433 + Ay; 0/0049; , 
Ag = A299 0/ OAg99 + A219 O/ OAg19 + Aa99 O/ CA ga9 + Ayq1 O/Oag9) + Ayq1 O/OGy, + Aggy O/ OA g91, 
Ag = A399 0/0399 + 2319 O/ OAg19 + A399 O/ OAga9 + Aso, 0/Oagq) + A311 0/0431, + A321 O/Od32, 
Ay = Gg 90] g 19 + 4419 O/ Oy 19 + Poi9 O/ OA gy9 + A319 O/O319 + Gory O/ Ogg, + Ay41 8/0041, 

+ qq; 0/0 Gg, + M11 0/0035, p (5-2) 
Ag = M929 O/C ga9 + 229 0/0412 + Ao99 0] OA 209 + A329 O/ CAga9 + Ay21 O/ Od gay + M21 O/ Oya 

+ Aggy 0/ OAy9 + Ayo; 0/0391, 
A" = G91 8/0Gq91 + 2191 O/ 08491 + Ayq1 O/ CA gq, + Agq1 O/ Og + Mgq1 C/O gy, +2411 0/0044, 

+ @gy1 O/C g,y + @yy1 O/ OGgy1 + Ayay O/ Cgg1 + Ay 91 O/ CA, + Ayoy O/ Ogg, + Ay O/OAg0)- 





Ki; denotes a cumulant which may be written in full as 


r Ts ~ ae vat Fas 


Taytl Try+2 ws ial Trying 


Ty) 


A line is devoted to each variate, and the r’s denote orders of cumulants, the total order being 
X(r). 7,; occurs in the ith row and jth column. 4(;;,) is a similar cumulant in which ;; is 
replaced by 7;;,;- 
As before, we begin with any K_1.. = spy. 


To work out the cumulants to any fixed order we need only go to an order in the multivaciate 
multinomial sufficient to distinguish all the separate cumulants that can occur. In recording 
these we need only put down a typical! one of each kind. To the fourth order, for example, we 
need not go beyond a (5 x 4 x 3 x 2) table. As a check, indeed, we can enumerate the formulae 
that can arise from such a table, even although many of the formulae in the list can be derived 
from simpler tables. In the results which follow the separate patterns are given, together 
with the number of formulae which conform to this type and the cumulant for the type. The 
cumulants are distinguished by their suffices, with dots separating the variates. Thus 1,9 991) 











54 Cumulants of multivariate multinomial distributions 


for example, denotes this particular third order cumulant derived from a bivariate 4-nomial. 
To condense the formulae we shall extend the p, ) notation already used in § 3. Thus: 
Par) = Pi2—PrP2 
Pas.) = Piz.— Pr P2 
Pazy = Piz— 2p, Pr, ete., 
Pasa) = Pi23— Piz. P3— P1.3P2—P.23P1 + 2P 1 P2P3, etc., 
in which p,.. denotes the sum of all primary probabilities p in which the first and second 
suffices are 1 and 2, etc. Single suffix p’s are marginal probabilities as already defined: 
Pazss) = Presa — (Pres. Pa + Pre. P3+Pr.saP2 +P. 294Pr) 
— (Piz. Psat Pi.s.P.2.a+Pi..aP.23.) 
+2(Pys. SPA +Pr.g. P2Pa + Pr..4P2P3t+P.23.P1Pa + P.2.4Pi Pst P..s4PiP2) 
— 6p, p23", ete., 
in which pj, denotes tie sum of all primary probabilities in which the first, second and third 
suffices are 1, 2, 3, etc. 
6. Patterns and cumulants from a (5 x 4 x 3 x 2) table, to the fourth order. 


Pattern Number of cases Cumulant 
Ist order 1... 
10 

y 1 K, = sp 
2nd order 2... 

— 10 

1 K; = spq 

il 

pee 10 Ky = ~SPiPs 

ae 

= 10 

= 10 ( 2) Kia = 8Pay 





25 *10.01 = §P a2) 
') 
2 
3rd order 3... 
° 10 
( t) K; = spq(q— Pp) 
Si... 
ry 20 Key = — 8P, P91 — Pr) 
, ae 
BGssY 10 
20% 2 ( 2) Kea = SPan(q—P) 
3... 
= 3 
we 50 29.01 = 8P aw(% — Pr) 
J 











—— 














JOHN WISHART 55 


Pattern Number of cases Cumulant 

3rd order 111. 7 
(cont.) cares sa 
7 Kin = 28, PePs 


Ky a0 = —8(PyP ent PeP an) 


30 Ky0.001 = — $(71P en + PP a») 





ae 
ae ss 
l ss Kaa = 8Pam 
SS 
Bie 30 
7 *y0.10.01 = §P (112) 
= 2 
eS 15 
~ - *300.010.001 = SP (23) 
(‘s) 
3 
4th order 4... 
10 
l Ky = spq(1 — 6pq) 

31.. 7 
a Kg, = — 8P,P2(1 — 6p, q,) 
22... 

10 Kee = — SP, Pal(Gi — Pr) (Ge — Pe) + 2P Pe} 
3... 
Ra< 

20 Ks. = 8Pqy(l — 6pq) 

{10 
 3(°3) 

wae 
- ee 

10 Kea = spanl(d—P) (Q'—P)— 2pavd 
is 

50 K30.01 = SP aa(l — 6p, %) 
2. 
. 3 o= , ’ 

25 Ka0.02 = SP anl(Gi — 71) (G2 — P2) — 2P av} 








56 Cumulanis of multivariate multinomial distributions | 





Pattern Number of cases Cumulant 
4th order 211. 7 4th ¢ 
cont. i co. 
( ) 15 Kary = 28P, P2P3(q, — 2p,) ( 
31. 
| es 
35 Ka 10 = — 8{P,P en(% — Pi) + PeP avl% — 371)} ) 
aS... 
: re 
35 Ky2.10 = — 8{PiP en(G2— 3P2) + 2? av(Ge— Pe)} 
| a \ 
5... ’ ° 
; 35 Ky.20 = — 8{(PiP ent P2P av) (G1— P1) + 2P ayP ent 
a 
a 
: 60 Ke10.001 = — 8{P1P e991 — P:) + PoP 13991 — 3P1)} 
zz. ( > 
RR... ; 
won se 
y 30 K110.002 = — 8{(P1 Pen + P2P as) (Y3 — Ps) + 2P aw? eat 
. A \ 
1. 15 Keaa = &{Paw(¢-P)—2P a Parad 
a 
= > 
. 60 29.10.01 = 8{P ar9(% — P1) — 2P aa» P awd 
es 
Bas 7 ° ° 
2 30 10.10.02 = 8{P cial 92 — P2) — 2P a »P avd 
— o 
a 
F ss K290.010.001 = ${P a23(91 — Pr) — 2P a2.)Pa.s} 





— 


4th order 
(cont.) 


Pattern Number of cases 
1111 a 








1 
lll. 
l.. 9p 
lll 
‘a 7 
ae 
. 5 
| oe 
ll is 
et 
ll. a 
11. 
Bu 
. 24 
| 
= 
A 18 
_ 
A F 4 
nou 
ll 
cl ogg 
eS 5 
.1l 
x's 
* ll 
Re, 
.. 
1. l 
1 
Eee 
= 
B 7 
1 
: 
‘2 
y 4 
1 
ae 
. 
:. 11 
1 
am 
<a 
1 . ¢ 
l } 4 


JOHN WISHART 


Cumulant 


Kun = — 6sp,PePsP 


Ki11.100 = 28(P2PsP ant PsPiP ent PiPe?P ow) 


Ko111.1000 = 28(PsPaP en t+ PePaP ov t+ P2PsP an) 


Kua = —8(PawP ex +P ax? ew — 2Pi: Pi P2P2) 


Kyoon = —3(P ax? ey +P asyP ey — 2PiP2P2Ps) 


Ky100.0011 = — $(P aay P ay + P aay oxy — 2Pi P2PsPa) 


Ky 10.20 = —8(PiP emt PePawt+PaoPevtPa»Pea.)) 


Kira0.01 = —8(PriP eat PeP mat PaoPestPasP er.) 


K190.100.011 = — 8(P2P ais) t+ PsP aint Pa»P(aat Pas»P ia») 
Ky00.010.011 = — 8(P2P aey + PSP aot Pa»P (29+ PasP en) 
K1000.0100.0011 = — 8(P3P aso + PaP uey + Poa? (20+ PaoP (2) 


yaaa = s8Pau»n 


*10.10.10.0 = 8P aia) 


*10.10.01.01 = SP 122) 


* 100.100.010.001 = SP (1123) 


* 000 .0100.0010.0001 = SP (a234) 


57 








58 Cumulants of multivariate multinomial distributions 


The above formulae were worked out directly, and have been checked by condensation 
from the last one of each order. The rules are simple. Suppose, for example, we wish to derive 
K19. 10.01.01 {OM Ky999 0100. 0010.0001» @ CaSe in which the number of variates is unchanged. We 
first put 2 = 1 and 4 = 3, and then write 3 as 2. This changes pqo34) into P4399). Now suppose 
we wish to derive Ko 91.91 ffOM Kyo. 19.01.01: 10 do this we coalesce the first two variates. We 


find that Pree = Pr.ee Prue. = Pi.e.. Pa2=Pi.ie Pu..= Pi... = Pr 
We then suppress a dot in the first or second place, while pj is read p,, pj is read p}, and p” is 
read p3. We then have 


Ko9.01.01 = {Piz — (Pre. P2 + Pr. oP + 2P122P1) — (Pi P 22+ 2Py2. P12) 
+2(p,P2P2+ 2Pr2, Py P2+ 2M, 2P1P3+ DP. ooP?) — pips ps} 
= 8{Pi29(91 — P1) — 2Pas.)Pa.o}- 
On the other hand, if we want x, ,; 9, from the same source, we must note that 


Pre = P12 = Pus. = P12, = 9. 
In the remaining terms we then suppress a dot in the second or third place, while p3 is read pj 
and pz is read p3. We then get 


K19.11.01 = — 8{Prr2P2+ PixePit Pu. P.22t+ Pie. P.12 
—2(Pir. P2P2+ Pris. PiP2+ Pi.2PiP2+ P.12P1P2+P 22? Pi) + 6p, Pp; pp} 
= — 8{P\ Pies) + P2 Pairs) + Par). 22) + Paz.»P(.19}- 
These rules are obviously useful if higher order results are required, for from the most 
general results of the second, third and fourth order we can conjecture similar results for 
higher order and can then coalesce. Thus the general fifth-order result is sp4934;), Where 


Pr2sas) = Pr2sas— (Prose. PI +... 5 terms) 
—(Pre3..P..ast--- 10 terms) 
+2(Dy05. Pg PE +... 10 terms) 
+2(Dy2...P..34. PS +--. 15 terms) 
—6(Pi2.. P3P4 PS +... 10 terms) 
+ 24p, ps ps pq Py’. 

Also the general sixth order result is 3p534;¢), where 
Poesase) = Pr23ase — (Presas. PE +--- 6 terms) 
—(Presa..P....s¢+--- 15 terms) 
— (Pras... P...456 + -.» LO terms) 
+2'(Pyo34.. P5§ Pat... 15 terms) 
+2'(Dy93...P...a5.PE+--. 60 terms) 
+2'(Pyo..P..s4..P....s¢++-. 15 terms) 


—3\(Pio3... Pa PS PEt... 20 terms) 
—3\(Pie.. P.sa.. PS PEt... 
+4(Dyo P9PE PS PEt... 
— 5!P PLVsP4 PS PR- 
It only remains to add that a similar set of cumuiants to those of this section exist for the 
corresponding Pascal distribution, in which p is replaced by a, q by b, and all the signs are 
positive. 


45 terms) 
15 terms) 


REFERENCES 
Friscu, R. (1925). C.R. Acad. Sci., Paris, 181, 274. 
Goraas, P. (1936). Skand. AktuarTidskr. 19, 200. 
GULDBERG, 8. (1935). Skand. AktuarTidskr. 18, 270. 
HALDANE, J. B. S. (1940). Biometrika, 31, 392. 
QVALE, P. (1932). Skand. AktuarTidskr. 15, 196. 
Wisuaart, J. (1947). J. Inst. Actu. Stud. Soc. 6, 140 





etal — 














Wis! 
tion 
and 
popt 


in © 
in tl 


havi 








[ 59 ] 


ON THE WISHART DISTRIBUTION IN STATISTICS 
By A. C. AITKEN, D.Sc., F.R.S. 
Mathematical Institute, 16 Chambers Street, Edinburgh 


1. IyrrRopUcTORY 


Wishart’s distribution (Wishart, 1928; Wishart & Bartlett, 1933) is the probability distribu- 
tion of the estimates of the $4(k+ 1) moments of the second order, usually called variances 
and covariances, from a sample of n k-ary vectors drawn from a k-variate normal correlated 
population. Let the probability differential of the population be 


dp = (2m)-** | V |-texp(— 42’V—!z) dx (1) 


in matrix notation, where V = [v,;] is the variance matrix and dz is the element of volume 
in the variate space. It is assumed that n sample vectors 

BM wx {4 2g... Xy} (8 = 1,2, ...,) (2) 
have been drawn from (1). One evaluates in the usual way the k sample means, Z,; 
(j = 1,2,...,%) and then the $k(k + 1) estimates of the v,;, namely, 


n 
mad pe in Fa) (in — 3). (3) 


Wishart’s distribution is that of the ,; and may be written in matrix notation as 


dp = c,,exp[—}tr(n—1) V— P] | V [Hen | p [He g9, 
k 
where Cen = {T\(3)} #*— (in — 1)}#er—D ll {T}(n —h)}}? (4) 
h=1 


Here V = [d,;], a necessarily positive definite matrix, tr M means the trace or sum of diagonal 
elements of M, and dd is the element of volume in the $4(k+ 1)-dimensional space of the bi; 

Various methods have been given for the derivation of this valuable distribution, which 
evidently generalizes to the case of a symmetric and positive definite matrix variate the 
familiar gamma distribution of a scalar variate. A review of these methods is given in a recent 
paper (Wishart, 1948) by the original discoverer himself. His own first method was straight- 
forward, to transform the nk-fold sample normal differential by introducing ‘quadratic 
co-ordinates’, and to integrate away with respect to the undesired residual variables. Such 
a procedure can be, and has been, described in geometrical language. The later method of 
Wishart and Bartlett (1933) consistec in constructing the multiple Fourier transform of 
the d,; and reciprocating it by the use of an important lemma (Ingham, 1933) which generalized 
Hankel’s contour integral for the [’-function to the case of $4(k+ 1) variables, elements of 
a positive definite Hermitian matrix. Yet again, other methods (e.g. Hsu, 1939) depend on 
an induction. But we may refer to Wishart’s paper (1948) for further information on the 
history of the problem. 

Now it would seem that, whatever derivation is adopted, one can hardly expect to avoid 
an encounter with the analytical theory of non-negative definite quadratic and bilinear 
forms; and in fact a fundamental lemma, which stands to Ingham’s lemma as Euler’s integral 
for the ['-function stands to Hankel’s contour integral, is to be found in a paper on the above 











60 On the Wishart distribution in statistics 


theory by Siegel (1935). This lemma is quoted, proved differently, and used in another 
context by Garding (1947). 

We establish this lemma in §3. It is necessary to mention in the first place some general 
considerations on matrix transformations and associated vector transformations. 


2. MATRIX TRANSFORMATIONS AND RELATED VECTOR TRANSFORMATIONS 


Let X be an arbitrary rectangular matrix of order m x n. Let us suppose its elements written 
down, row after row, as the mn ordered elements of a vector 


§ = {Xp Xyq ++ Xyy Xqy Taq ++» LoqV yy --» Lmn}- (1) 
Now let Y = H'XK, where H and K may be rectangular, and let a vector 9 be written 


down for Y in the same way as (1). By inspection of the (i, j)th element of Y, namely, 
> hpi Xjqhy, and of the coefficient of x, in it, namely k,;h,;, we see that 
k,l 


9 = (K’x HE, (2) 
the transforming matrix being the direct product. In particular, if H is of order m x m, and 
K of order n x n, the Jacobian of the transformation (2), by a well-known result, is equal to 
| H|"| K|m. 

Next consider X = X’, of order n x n, congruently transformed to Y = H'XH. Let & be 
now the vector made from the 4n(n+ 1) elements in and above the diagonal of X, written 
down row after row as before; and let 7 be written down likewise for Y. By a similar inspec- 
tion of the typical element in H’ XH, account being taken of the fact that k,; in (2) is now h,;, 
we arrive at the very useful result 

q = HE. (3) 
The transforming matrix here is the ‘second induced’ or ‘second Schlaflian’ matrix of H, 
sometimes called the ‘symmetrized direct square’; namely, the matrix which, when H 
transforms a vector, transforms the squares and binary products, duly ordered, of elements 
of that vector. Again, by a well-known result, the Jacobian of the transformation (3) is 
Eee 
3. THE LEMMA OF SIEGEL AND ITS APPLICATION 


The lemma is as follows: given S, an arbitrary positive definite real symmetric matrix, and 7’, 
a variable positive definite real symmetric matrix, both of order k x k. Integration being 
over the domain of positive-definiteness of 7’, it is asserted that* 


k-1 
exp(—tr ST’) | T |"-#+Ddt = {T(4)}**-) TT P(n—4th)| S|. (1) 
J F ssaO 2 


We follow Garding in employing the ‘triangular matrix’ transformation. First, we remark 
that S may be factorized uniquely into HH’, where H is ‘positive lower triangular’ (p.1.t.), 
i.e. 2; >0, x,; = 0,¢<j. For it is known that a positive definite symmetric matrix S may be 
expressed as MM’, where M is a real matrix, with an indeterminacy in respect of M, in that 
MK, where K is orthogonal, would serve as well. However, if M is 1.t., as can be the case, 
since any matrix can be reduced orthogonally to triangular shape, and MK is likewise 1.t., 
then K is l.t. Now the only orthogonal matrix that is 1.t. is 7, the unit matrix; and so the 
p.l.t. resolution of S is unique. A similar result holds for positive upper triangular factoriza- 
tion; and the extension to Hermitian matrices, in the shape H H’, is evident. 


* The Editors kindly recall to me that Siegel’s lemma, in an equivalent form, is stated and used 
in Cramér’s Methods of Mathematical Statistics (1946), pp. 390-4. 

















Nov 
tr(S7 
transi 
name 
|| 


since 
It 
Siege 


Thu 


sinc 
a Le 
cons 
Wis 
pro. 
use 

inte 


ma 














A. C. AtTKEN 61 


Now in (1) put S = HH’, H’'TH = U = XX’, where X is p.l.t. also. We then have 
tr(ST7’) = tr(HUH™) = trU = tr(XX’) = sum of squares of all x,;. The Jacobian of the 
transformation U — X is again 1.t., and so is equal to the product of its diagonal elements, 
namely 2*xk, xk>tak>* ... 2... The Jacobian of the transformation 7'— U is the reciprocal of 
| H'?!|, namely, | H |-*+) = | S|-##+». Thus the left-hand member of (1) is reduced to 


2" [exp (—tr XX’) | S|-*+HE+D | § |W +D ak ahs)... a dE 
k-1 
= (Pq? TT Pm 4h)| 8% (2) 


since the integral has been resolved into the product of $4(k + 1) integrals of familiar type. 
It will be interesting at this stage to invert the order of demonstration and to obtain by 
Siegel’s lemma the multiple Laplace transform of Wishart’s function. The integrand will be 
cexp(—trS P) exp{—4tr(n—1)V- V} | V |-#-»| ? j-2-8 
= cexp {- tr (n = 1) i (2 + 4) r| | V ‘Satie | r |Ma—2-2), (3) 


—}(n—]) 


, (4) 


2V8 


Thus the transform in question is ges 


I+ 











since all the '-functions indicated in c cancel out; or we might merely have observed that in 
a Laplace transform, or moment-generating function (m.g.f.) of a probability function the 
constant term must be unity. Now this m.g.f. in (4) is known otherwise (Aitken, 1931; 
Wishart & Bartlett, 1933) to be indeed the Laplace transform or m.g.f. of the 0;;. Hence, 
provided reciprocation is unique, we may pass from this m.g.f. to Wishart’s function by the 
use of Siegel’s lemma. This uniqueness is assured by the boundedness and continuity of the 
integrand in any domain of the space. 


4. THE MOMENT-GENERATING FUNCTION OF THE ESTIMATES 


We conclude by giving a revised version of the derivation of the m.g.f. of the d,;. Assigning 


8;; a8 mMoment-carrying variable for 6;,, and 2s,; for 0;; (i+j), we have for the m.g-f. the 
nk-fold integral 


c [exp] - $>2'V- e—* Beula Deinejn— (Ea) (Bea) [ae (1) 


The quadratic form in the exponent of the integrand is 42’Qz, where Q is a positive-definite 
matrix partitioned into submatrices of order k x k, thus, 














7 28 28 28 : 
7-1 eee ani — 
4 ae n(n —1) n(n—1) ~~ 
28 28 28 
v- in 
g =| nn—1) th OT Rn-1) (2) 
28 28 28 
i") “wen Son 














aA 














62 On the Wishart distribution in statistics 
We transform this to HQH-'!, where 
ri -I i me ee eS 7 
I -I . oe *: 
H= TF ipa! foe es Bei * (3) 
- : a i 
L “‘if] { Sdidisblatd owas I 
. : 2V8)\. 
where HQH-' will be found to show n—1 submatrices v(1 t—". isolated down the 


diagonal, and a last row of submatrices not involving S. So, using the standard result on the 
integral of exp (— 4x’Qzx), and recalling again that in any m.g.f. the constant term is 1, we 
arrive at the determinantal value 

2V 8 | -H—D 

n—1 | 








and now reciprocation can proceed. 


5. MoptiFrIcaTION OF WISHART’S DISTRIBUTION 


In the case of each variate x; we took the mean Z;, and estimated the 6,; in terms of deviations 
from these means. In other words, we fitted constants, namely, means, by least squares to 
each set of n observations of a variate x;. But cases are encountered in which not a constant, 
but a polynomial of assigned degree, or so many harmonic terms, or some linear combination 
of arbitrary independent functions, or the like, are fitted to the variates, the d,; being then 
estimated from the residuals after the fitting. In an earlier paper (Aitken, 1946) we have 
examined such cases in some detail and have shown that, provided all variates are fitted to 
the same orthonormal functional basis of | independent functions, the distribution of the 
d,; has m.g.f. V8 | -Kn—D 


ae 





as one might indeed expect; but notice must be taken (ibid.) of strict conditions governing 
the functional representations. When these are satisfied, and the residuals are assumed to 
be a normal sample, Wishart’s distribution still holds, with n — 1 replaced by n —1. 


REFERENCES 


A1rkeEn, A. C. (1931). Some applications of generating functions to normal frequency. Quart. J. Math. 
(Oxford Series), 2, 130. 

AITKEN, A. C. (1946). On a problem in correlated errors. Proc. Roy. Soc. Edinb. 62, 273. 

GArpinG, Lars (1947). The solution of Cauchy’s problem for two totally hyperbolic linear differential 
equations by means of Riesz integrals. Ann. Math. 48, 785. 

Hsv, P. L. (1939). A new proof of the product-moment distribution. Proc. Camb. Phil. Soc. 35, 336. 

IncuHaM, A. E. (1933). An integral which occurs in statistics. Proc. Camb. Phil. Soc. 29, 271. 

Srece., C. L. (1935). Ueber die analytische Theorie der quadratischen Formen. Ann. Math. 36, 527. 

Wisnart, J. (1928). The generalized product-moment distribution in samples from a normal popula- 
tion. Biometrika, 20 A, 32. 

WisuHert, J. (1948). Proofs of the distribution law of the second order moment statistics. 
Biometrika, 35, 55. 


Wisnart, J. & Bartiett, M. S. (1933). The generalized product-moment distribution in a normal 
system. Proc. Camb. Phil. Soc. 29, 260. 

















THE 


Consi 
where 
the a 
Khin 
We 
a nec 
as sel 
(0) 


(9) 
theo: 

In 
cova 
theo 
(193 

W 


w'(6 
com 
usec 
of ¢ 
The 
a co 
selv 
kno 
not 


ave 


ani 





[ 63 ] 


THE SPECTRAL THEORY OF DISCRETE STOCHASTIC PROCESSES 
By P. A. P. MORAN, Institute of Statistics, Oxford University 


Consider a stationary stochastic process defined by a sequence {z,} of continuous variates, 
where ¢ = 0, +1,.... Wold (1938) has proved the following fundamental theorem, which is 
the analogue for discrete processes of the theorem proved for continuous processes by 
Khintchine (1934). 

Wo .p’s THEOREM. Let p,(k = 0, +1,...) be an arbitrary sequence of constants. Then 
a necessary and sufficient condition that there exists a discrete stationary process with these 
as serial correlation coefficients, is that there exists a non-decreasing function w(@) such that 
w(0) = 0, w(7) = 7 and 


Pr. = = | "cos k0 dw(@). (1) 
7J0 


(9) is then known as the integrated power spectrum of the process and is fundamental in the 
theory of generalized harmonic analysis. When w’(@) exists, it is known as the spectral density. 
In the present paper we shall show how the study of the spectrum of a process and of a 
covariance generating function introduced by Quenouille (1947) can be used to simplify the 
theory of such processes and, in particular, to provide short proofs of the theorems of Slutzky 
(1937) and Romanovsky (1932, 1933). 
Wold shows that when w’(@) exists in the interval (0, 7) it is given by 


w'(9) = 14+2 z p,, cos ké. (2) 
I 


w'(@) will always exist for processes generated by taking an infinite moving average of a 
completely random process, whose weights form an absolutely convergent series. The series 
used in most studies of time series are of this type, for they are either finite moving averages 
of completely random series or the solutions of stationary stochastic difference equations. 
The latter are equivalent to an infinite moving average whose weights are dominated by 
a convergent geometric series. For simplicity it is convenient in what follows to restrict our- 
selves to series generated by taking a (possibly infinite) moving average whose weights are 
known to be dominated by a convergent geometric series. This restriction is convenient but 
not essential. 


If we write z = e”’ in (2), we obtain a function first introduced by Quenouille which gener- 
ates the serial correlations. This is S p,.z*. We multiply this by H(z), thus replacing the 


serial correlation coefficients p, by the serial covariances c, = c_, = H(x,;x;,,). We then have 
Sle) = 5 cz, (3) 


which we call the covariance generating function. When the process is generated by a moving 
average whose weights are dominated by a convergent geometric series, the series (3) will also 
be dominated by a convergent geometric series and will be a Laurent series convergent in an 
annulus 1 —d< |z| <1+4é. The coefficients c, will therefore be uniquely determined by S(z). 








64 The spectral theory of discrete stochastic processes 


It is convenient to call such processes ‘Laurent processes’. In the more general case where the 
weights and therefore the p, are only known to form an absolutely convergent series, we have 
to use theorems on the uniqueness of Fourier series. Even in this case it is algebraically more 
convenient to write (3) as a Laurent series, although we may only know that it converges on 
|z| = 1. We also note that S(e) = c,w'(0). 

Now suppose we have such a process {z,} with serial covariances c, and covariance generat- 
ing function S(z). Define a new process {y,} by 


oo 
A= Y UX, (4) 
i=0 
where >) a; is dominated by a convergent geometric series. Theri it is not difficult to prove 
i=0 


that (4) is convergent with probability one and defines a stationary Laurent process. Write 


Cy = C_y = E(ypyesrx) 


Then S(z) = 


ll 
gs 

Ms 
TMs 
o 

R 

-. . 

R 

~~ 

Qo 

ow 

+ 

7 

~~, 

nx 


y a,z-*) (= a,2') S(). (5) 


i=0 
This is the fundamental formula in what follows and shows the effect on the covariance 
generating function of taking a moving average of the process. 


APPLICATION TO STOCHASTIC DIFFERENCE EQUATIONS 
Quenouille (1947) has used this type of result to discuss the solutions of stochastic difference 


@ 
equations and in particular to calculate > p,p,,;, which is useful in calculating the co- 
k=-—@ 


variance between two sample serial correlation coefficients of different orders. Thus if we 
have an equation Xp t+ Ay yy t ... +O, %y_y = Mp (6) 
where {7,} is a completely random process and the equation 
z*+a,z21+...4a, =0 (7) 
has all its roots inside the circle | z| = 1, we can see that {z,} is a Laurent process. For multi- 
plying (6) by z,_,(s > 0) and taking expectations we get 
Co+Q,C,_ 1+... +a,C,_, = 0, 
and the condition on (7) implies that the solutions of this are dominated by the terms of 


a convergent geometric series. If S,(z) is the covariance generating function of {2,} we then 
have, since the c.g.f. of {y,} is 1, 


1 = (l+a,z+...+a,2")(1+a,2-1+...+a,2-*) S(z), 
and so S,(z) = [((l+a,z+...+a,2*) (1+a,z1+...+a,2*)}? 
and ¢)w'(@) = [(l+aj+...+aZ)+2(a,+a,4,+...+@,_,a,) cos +... + 2a, cos kO]-}. 





We: 


has a: 


define 
rando: 
using 
that t 
a sing 
para 
(1943) 
detail 


Equa’ 
(1937 
series 
2) = 
finite 
any g 


mn-* 


The « 
proce 


and ] 


Ther 


The! 





P. A. P. Moran 65 
We notice that, if {z,} is a process generated by the above relation (6), then 
Cp = pty Myyy t+ +My x 


has a c.g.f. equal to unity and zero serial covariances and so is completely random if the 
process {z,} is Gaussian. This result is otherwise obvious, but is of interest in showing that 
processes of this type can be regarded as completely reversible. 

The above results are also of interest in throwing some light on what happens when we 
define a process by means of an equation like (6) but drop the condition that the {9} are 
random. If we know the c.g_f. of the {y,} process we can find that of the {z,} immediately by 
using (5). This is of value in studying multivariate processes, and Quenouille has suggested 
that these also be studied by using generating functions with several parameters instead of 
a single z. It is probably more illuminating, however, to use matrix notation and a single 
parameter. Since, apart from an investigation on sampling properties by Mann & Wald 
(1943), the theory of multivariate stochastic difference equations has not been set out in 
detail, we shall do so in the last section of this paper. 


SLUTZKY'S THEOREM 


Equation (5) enables us to give much shorter proofs of Slutzky’s sinusoidal limit theorem 
(1937) and its generalizations (Romanovsky, 1932, 1933). Consider a completely random 
series {z,} with finite variance and perform the operation of taking a moving sum of two, 
xP = x,+2%,_,, n times. Then take the mth difference of the resulting series. If we take a 
finite section X,, ..., X,, of the resulting series, this will differ from a sine wave with more than 
any given relative error with a probability that tends to zero when » tends to infinity and 
mn! tends to a constant A such that 0 <A < 1. The period L of the sine wave will be given by 
1—A 


cos 27-1! = Tea (8) 


The c.g.f. of a completely random series is unity, and so Slutzky’s series will be a Laurent 
process whose c.g.f. is 
S(z) —_ (1 = z-4y*(1 wes z)r(1 ts ie as | + z)*, 
and putting z = e we get 
Cow'(O) = S(z) = 2%"+” sin2™ (44) cos?” (40) 


= 2+» (1 —cos@)™(1+cos6)". 


Co 7 : 


ll 


Then Co 


- gumin) i ” sin®™ (40) cos (40) d0 
0 


gent) I'(m + 4) P(n +3) 


~ al'(m+n+l) 


al(m+n+ 1) 


C(m + 4) P(n+ 4) 
_ m2" T(m+n+!) 
~ T(m+4)T(n+4) 


Therefore w'(9) = 





sin?” (40) cos*" (40) 





(1 —cos 8)" (1+ cos 6)". (9) 


Biometrika 36 











66 The spectral theory of discrete stochastic processes 


Slutzky’s result arrives from the fact that w’(@) has a single peak in the region of which most of 
the spectral density is concentrated, and so (4) tends to a step function. The peak occurs at 
the point 6, where w” (@) = 0. Thus 

n—m 1-A 

qx > ‘ 

n+m 1+A 


To prove that w(@) tends to a step function we show that 


cos Oy = 





lim o(#=90 if 0+ cos) —— ‘ 
ma" A 


n—m 
n+m 





=o if @=cos" 


Consider the asymptotic behaviour of (9) for m and n large. For large x, ['(x) behaves like 
(27) e-* x*-4, and so w’(@) is asymptotically equal to 
—M—R e—m—n—1 (mn +n+ ])mt+nth 
emt (m+ gyme m4 (n+ 3) 
je (m2 “m2 (om + n+ | )metnte 
(m + 4)™ (m+ $)” 


(Qn) 





(1 —cos 8)" (1 + cos 6)" 





(1 —cos 6)” (1 + cos 4)". 


n—m m—n 2m 
N 7 t 3A Pee ae , ae ie P csc: OM r oe ’ ° ° 

ow put cos =———_ +p, w here ee ee Then when p = 0, w’(6) is asymptotically 
equal to 


(477)? (m+n+ 1), 
which tends to infinity as n increases and mn- tends to A. If p +0, w’(@) is asymptotically 
equal to m )\" 
; 1)! 1, Pim +n)\"{, p(m+n) 
(477)# (m+n+ 1) i ti \! a wae ih 


and this may be verified to tend to zero as n tends to infinity and mn— to A, uniformly in any 


; ‘ 1-A ‘ ; 
closed interval excluding #, = cos-'——.. It follows that w(#) tends to a function with a 


1+A 
single step at ‘oe 
0, = cos! rey 
of magnitude 7 and p, tends to cos sO. 
We now show that this implies that a given stretch X,, ..., X y of the final series tends, in 
probability, to a sine wave as n increases, mn~ tends to A, and_N is kept fixed. For consider 


T = z (X;, De 2p, Xj44+ Xj)". 
Then E(T) = 2(N — 2) cq(1 — 2p} + p,). 
As n increases, cy ' E(7’) tends to 

2(N — 2) (1 — 2 cos? 0, + cos 20) = 0. 
Given € small we can choose n so large that 
Neo! E(T) <é, 

and so Pr. {NT > cge*} <e, 

N 


2 
and Pr. > | X;,2—29,X54,+X; | > abe} <e. 
i= 





Thus 


is ap 
Oe 
series 
whicl 

Ro 
amo 
again 


This 
havi 
lemn 
resul 
not 1 


is un 


wher 
m ar 


bein 


not | 
lute’ 
T 
effec 
avel 
effec 
app 
the 
wav 
osci 
B 
of t 
arrs 


and 


we 
one 
two 
sati 


in 
er 





— 


P. A. P. Moran 67 
Thus . Xiso— 29, Xi41+X; = 0 
is approximately satisfied for i = 1,...,N—2, and the probability that the sequence 
X,,..., X y will differ from a sine wave by more than a small relative error will be small for the 
series {X,}, for i = 1,..., N can then be regarded as the sum of a ‘complementary function’, 
which is a sine wave, and a ‘ particular integral’ which will, in general, be small with e. 
Romanovsky (1932, 1933) has generalized Slutzky’s theorem in various ways. If we take 


a moving average of s( > 2) terms and repeat this » times, differencing the results m times we 
again get a sinusoidal limit theorem and S(z) will be given by 


S(z) =n (1 =< 2 t}* (1 ns zn (1 = zs) (1 — 28)", 
This simplifies a good deal of Romanovsky’s proof, but the discussion of the limiting be- 
haviour of w‘(@) is then not so simple as above, and it is necessary to use Romanovsky’s 


lemma I (1932, p. 85) to complete the proof. In a second paper Romanovsky proves that the 


result still holds when the process of averaging and differencing is applied to a series which is 
not random provided that es 
Dd p;, cos kd 
1 


is uniformly convergent in (0,7) and 


1+2> p;,cos ké, +0, (10) 
1 


where @, is the value of @ at which w(@) tends to have a step in the completely random case, as 
m and n increase. This result follows at once from the above discussion, the operator 


(1 s z-tym -n (1 = z)"-* (i = 2-8)" (i an z*)* 
being applied to the c.g.f. of the 2’s and the condition (10) expressing the fact that S(z) must 


© 
not have a zero at z = e, the result being true even when 5 p, is only known to be abso- 
lutely convergent. : 

The basic reason for the truth of such theorems as these is worth a little attention. The 
effect of repeating a moving average with positive weights is to generate a longer moving 
average whose weights can be approximated by the ordinates of a normal distribution. The 
effect on this of taking the mth difference is to generate a moving average whose weights are 
approximated to by the ordinates of the mth derivative of the normal distribution, that is, 
the mth tetrachoric function. These ordinates themselves mimic the oscillations of a sine 
wave,* and thus Slutzky’s operations have resulted in a moving average with weights 
oscillating with a period equal to that of the resulting nearly sinusoidal process. 

By the use of another parameter w as well as z it is then trivially easy to generalize this type 
of theorem to the case where we have a set of random variables {z;;} (t,j =-.., —1,0, 1, ...) 
arranged in a lattice. If we take repeated moving averages of the form 


, oe : 4 oS 
Vay = Keg Uy_y gH Xj, jy FUj_4, 5-1 


and then repeated differences of the form 

UY = Uy — By — Vp jatBarj—v 
we arrive at a process which mimics the product of two sine waves, one along the i axis and 
one along the j axis. In this way approximate solutions of partial differential equations in 


two or more dimensions can be built up out of random elements, but such solutions do not 
satisiy prescribed boundary conditions and seem to be relatively trivial. 


* See, for example, Szegé (1939, p. 194, formula (8+22-8)). 


5-2 








68 The spectral theory of discrete stochastic processes 
THEORY OF MULTIVARIATE STOCHASTIC PROCESSES ) 
Consider a stationary p-dimensional process defined by a random column vector 
x(t = 0, +1,...) 
with p components and a transpose 
KX, = (2), ..., af). 
For given s(= 0, + 1,...) we define the matrix . 
Se fe Se 


(ci*) = E(x,X;,.) = (c%#,)’ > 





1 
OP x, 
k _ ~ aoe 
where cf* = K(xiazf,,) = c*i,. 
; = ; ci* 
We write ci =o? and k : 
0 j 8 
o; OF; 


Then Cramér (1940) has shown that there exist p? (possibly complex) functions w,,(0) 
defined for — 2 <@<z7, which are of bounded variation and such that 


cik — | em dw;,(4). 
For j = k, w;,(8) is real and non-decreasing. From this it then follows, as before, that if the 


@ 
series > c}* is absolutely convergent, w},(@) exists and is given by 


t=—o 2 
w(9) = Y effet. ; 
We write z = e and put Si,(z) = D edkzs, 


and we define the matrix covariance generating function 
* 
S(z) = ( ps cftz'). 
2 
In all the applications with which we shall be concerned the series in each element of this 
matrix will be convergent in an annulus 1—déd< |z| <1+6. The matrix is therefore an 
analytic matrix function of z in such an annulus. 


Now define another vector process {y,} as a moving matrix average of the x’s by the 
equation y, = Ayx,+A,X,,+..., 
where each A, is a p x p matrix. It is then not difficult to show, by Kolmogoroff’s theorem on 
the probability of convergence of infinite random series, that {y,} will be a well-defined random 
process if each of the p* series formed by the (i,j)th element of the A’s is absolutely conver- 
gent. Write (¢/*) for the serial covariance matrix of order s of the y’s. Then 


(c#*) = E(y,yi,) 


ee) ro) 
al . . , , 
= B|( >> An Xen) ( >> issn As) 
m=0 n=0 


BAS SA ‘i 
= *\ >» pa m Xt—m Xt+s—n ny 
m=0 n=0 
@ @ 
_ —7 ; , 
- >» >» Ann (CES mn) An: 
m=0 n=0 


If we 


whi 
this 


con 


whi 
any 
(12 


Th 
fac 


in 


P. A. P. Moran 69 
If we denote the matrix c.g.f. of the process by S,(z), we have 
Sz) = 5 Gt) 


EYE (Anz) (Cf mn") Anz") 


—o m=0 


( > Az) St)( E A,2). (11) 


m=0 


I] 


This is the fundamental result for vector processes corresponding to (5), and by its aid we can 
now discuss the solution of vector stochastic difference equations. It may be compared with 
the corresponding result for continuous process (Bartlett, 1947, eqn. (7), p. 92). 

Consider a vector stochastic difference equation 


X,+ A, X)_,+.-. +A, X_, = (12) 


We suppose that y, is a column vector of y’s such that their variance-covariance matrix is 


B = (6,;) and that the », for different values of ¢ are independent. Then the matrix spectral 
generating function of {y,} is also B. 


To ensure thet the {x,} process is stationary, we must impose the condition that the roots of 


k 
1+ % Aa! =0, (13) 

i=l 
which is an equation of degree pk at most, all lie outside the circle | z | = 1. We now show that 


this condition implies that for each pair (j,k) the series > ci*, > c#*, are dominated by a 
v0 v0 


convergent geometric series. To do this we first obtain a difference equation for each 
cik(= chi, if j+k) 


which does not involve the other c’s. If D is an operator which lowers by unity the suffix ¢ of 
any term which it pre- or post-multiplies we can represent the matrix difference equation 
(12) by k 
(1+ 5 A.D‘) x, ~~ 
i=1 


i= 


This is algebraically equivalent to p difference equations. Apply the signed operator co- 


k 
factors of the ith column of the operator matrix (1 +> A,D') to each of these p equations 
i=) 
in succession and add. Denote these cofactors by Oj(D), ...,O$(D). We then obtain 


: 
}1+ > A,D| af = i(D) nf + ... + OF(D) yp. (14) 
This is a stochastic difference equation for 2} (for fixed i) whose right-hand side is a finite 
moving average of 7’s. The condition on the roots of (13) ensures that the process {2}} which it 
generates will be stationary. The highest power of D occurring on the right-hand side is not 
greater than k(p—1), and so in (14) no 7 occurs with a suffix earlier than ¢—k(p—1). If, 
therefore, we multiply (14) by aj_,, where s > k(p— 1), and take the expectation we get 


k a 
1+ ¥ A,Di|ci* = 0, 
| i=1 











70 The spectral theory of discrete stochastic processes 


foo} 
and the condition on the roots of (13) shows that the series } c#* is dominated by a con- 
s=0 


@ foo) 
vergent geometric series. We obtain a similar result for the series 5 c*, = 5 c* which will 
s-0 s=0 
also be dominated by a convergent geometric series. 
It follows that the matrix covariance generating function of the process {x,} is convergent 
in an annulus 1 —éd< |z| <1+4é and is given by 


k a4 k os 
(1+ Ss A=) B(1+ S jz!) 


i=1 i=1 


and from this all the serial correlations and the individual power spectra can be calculated. 


REFERENCES 
BartTLett, M. S. (1947). Stochastic Processes. Lectures given at North Carolina. 
CraMEr, H. (1940). Ann. Muath., Princeton, etc., 41, 215. 
KBINTCHINE, A. (1934). Math. Ann. 109, 604. 
Mann, H. B. & Watp, A. (1943). Econometrica, 11, 173. 
QUENOUILLE, M. H. (1947). Biometrika, 34, 365. 
Romanovsky, V. (1932). R.C. Mat. Palermo, 56, 1. 
Romanovskgy, V. (1933). R.C. Mat. Palermo, 57, 130. 
Sturzxy, E. (1937). Econometrica, 5, 107. 
Szxed, G. (1939). Orthogonal Polynomials. New York. 
Wo np, H. (1938). A Study in the Analysis of Stationary Time Series. Uppsala. 














Sw 
nectit 
estim 
is giv 
of the 


prope 
and t 
&. 
p par 
For + 
exact 
varia 


obse 


likeli 

It 
nece: 
adm: 
distr 


whe! 


whe 











~« 





[ 71 ] 


ON A PROPERTY OF DISTRIBUTIONS ADMITTING 
SUFFICIENT STATISTICS 


By V. 8. HUZURBAZAR, Fitzwilliam House, Cambridge 


Summary. A property of distributions admitting sufficient statistics is obtained, con- 
necting the likelihood function of a sample of » observations, the maximum likelihood 
estimates of the parameters and the information matrix. A geometric meaning of the property 
is given. The property is used in simplifying the calculations of the variances and covariances 
of the maximum likelihood estimates in large samples. Finally, it is shown in virtue of the 
property that the likelihood equations have a unique solution for every sample of any size, 
and that the solution does make the likelihood function a maximum. 

1. Let f(x, ,, 93, ...,9,) be the probability density function of a distribution depending on 
p parameters. For brevity we shall write (0;) for (@,, 92, ...,@,) as an argument of a function. 
For simplicity we shall confine ourselves to univariate distributions, but the analysis is 
exactly the same for multivariate distributions. For multivariate distributions of g random 
variables, (x) is simply replaced by (2, 2», -..; %q). If x,, 2g, ...,, is a sample of n independent 
observations from the distribution we shall call, for convenience, L = by log f(x;,9;) as the 
likelihood function of the sample. i 

It has been shown, under general regularity conditions, by Koopman (1936) that the 
necessary and sufficient condition that a distribution depending on p parameters should 


admit a set of p jointly sufficient statistics is that the probability density function of the 
distribution be of the form 


f(x, 0;) = exp| Dd Uy(9;) vg(x) + A(x) + BO, (1) 


where u,’s and B are functions of @,’s only and v,’s and A are functions of x only. 
Let now f(x, 6;) be of the form (1). We have 


L= x log f(x;, 4;) 
p n 
= 2 u,(4;) z U,(2;) + = A(x,;)+nB(9;) 
a P% 10(O;) Tet z  A(x;) +n B(O,), 
k=1 
where T,, = : U;(%;); 
i=l 
oL é OU» . 2a 
20, = , 00, hte ag (r oe » 2, «++, P)- (2) 


Let 6,, 6,, ...,0,, be a solution of the system of ‘likelihood equations’ 


oL 
06 


r 


=0 (r=1,2,...,p) 











72 On a property of distributions admitting sufficient statistics 


so that 8,’s are the maximum likelihood estimates of 6,’s. Then 


P (du, oB\ _ 
2, Gz), Ts hs Ga), wih (3) 


where by the notation (0%,/00,)e; we mean that the derivative is evaluated at (8,). 
Since H(0L/00,)=0, we have 








r) 
Bag ET nse = 0 (r= 1,2.) (4) 
Ll Pp Puy, eB a 

20,00, < +, 00, aU, T+ "00,00, (7,8 = = een i (5) 

Ln Pp OFu, oB 
~ . (sea) = = 30,00, 8%) + "59 36, (8) 

aL P ( Ou, eB 
- (sa: an = 2 taal Tr, sty (ea: ate: 7) 


The p simultaneous linear equations given by (4) enable us to evaluate 
E(T,) (k = 1,2,...,p). 
Substituting for £(7;,) in (6) we shall have 


oL 
Blagap) = 0) sy. (8) 
The p simultaneous linear equations given by (3) also enable us to express 7;,’s in terms of 
6,’s, whence we can substitute for 77,’s in (7). But it is interesting to observe that the pair of 
equations (3) and (7) are exactly of the same form as the pair of equations (4) and (6). In fact, 
2 
we can obtain the former pair from the latter by writing 7), for E(7,), 8; for 0; and Po = for 
r $s 
2 
E (se-a5)- Hence comparing with (8), the result of substitution for 7;,’s in (7) must be 
r 3 
oL 
loe-ap- & (8). (9) 
From (8) and (9) we have the property 


a) _ {z | OL ) (10 
(ss-aa),, - 2 (se-aa)}, (10) 


where by the notation on the right-hand side it is implied that 6; is replaced by @; after the 
expectation is evaluated. 


Similar argument leads to the generalization of (10) as 


(seca | =e om L -) mn 
a™,0"0,0"50)...)9, (eaana, am 6,.. \ ie 


where m = m,+m,+mz,+ .... Setting r,s = 1,2, ...,p, (10) may be written in the matrix form 


i( aL | aL we) 
(a5 26) ),|- ie 00, 00, \ |. (12) 


It is convenient to write (12) as 


L(-aze),]~[l#(-zz0)], as 


ob : , = 
where [z ( -3 0,0 =) | is Fisher’s ‘information matrix’. 








Tal 


TI 


3) 


4) 


rm 


12) 


13) 





V. S. Huzurpazar 73 


2. Geometrical significance. It is possible to give a geometrical meaning to the relation (13). 
We shall, however, confine ourselves to the case of a single parameter @. 


We have (-z), = {=(-Z)}, = 1(8), (14) 


where /(@) is the information function of a sample of n observations. /(@) is also known as the 
‘intrinsic accuracy’ of the distribution (Fisher, 1925). Since /(@) is essentially non-negative, 
1(8) is non-negative, whence from (14) (— 0*L/06*), is non-negative. The curve represented 
by the likelihood function of a sample may be called the ‘likelihood curve’. If p is the radius 
of curvature of the likelihood curve at 6 = 8, 


l (- seh, -( aL 


ae 2) aee(2 oo 


Sisto ; = 1(8). (15) 


The information function (or the intrinsic accuracy) therefore measures the curvature of the 
likelihood curve at the point represented by the maximum likelihood estimate. 
For large samples we have another ida Signe In large samples the variance of @ is 


1 
16) and the estimate of the variance of 0 i ise (6) Hence the radius of curvature of the likeli- 
hood curve at the maximum likelihood estimate gives an estimate of the variance of the 
maximum likelihood estimate in large samples. 

3. The variances and covariances of the maximum likelihood estimates in larye samples. 
If instead of replacing 6; by 6; we replace 0; by 6;, we shall have in place of (13), 


= | am) 
I(- 00, 00,)»,- ), 47 [ =( ~ 30, 06, 35) _ 
Taking inverses, when the matrices are non-singular, 
os) aa) 
| (--ao-a5 30, 00,) ,- ey -[2(- 30, 00, a5) ' sale 


The matrix on the right-hand side of (17) is the variance matrix whose elements give the 
variances and covariances of the maximum likelihood estimates in large samples. But the 
or \ 
relation (17) shows that it is not necessary to evaluate the expectations Z ( _ aaa) The 
variances and covariances can be simply obtained from the matrix on the left-hand side of (17). 


As an example (cf. Kendall, 1946, p. 37) consider the estimation of the five parameters of 
the bivariate — distribution 


dF =~ 





1 {(w-a@)?_2p(x—a)(y—f) , (y— A)? 
27040 . =p) | -3 21—p)| oF “_—  - | |axay. 
The maximum likelihood estimates are 


i - ] 4 1 a 1 z 
a=x, B=¥y, of = — X(r—2)*, eo PG, G, = © X(z—Z)(y—9). 


= —nlog [270,02,/(1—p*)|— =P 





(oi 2pea,c, Cai _ 2p(a— a) (B- B) . (8 —£)?\ 
“le 0102 a a? O10 o | 








74 On a property of distributions admitting sufficient statistics 





dot 
Now 
eL n n {305 2pp0,0, 3(a—a)? 2p(a—a)(B—f)) 
se ue fk ir aaa ieee) 
Ll f OL 
Hence #(-53) = ot 
n n (3 2? 
--aticala al 
_ ™(2—p*) 
~ o%(1—p?)" | 


4. Maxima of the likelihood function. Let 8;, (j = 1,2, ...,p) be a solution of the system of 
likelihood equations. The information matrix 


o2L 02 \ 0 0 \] 
5 a ee ee \Vt= —— r, 0.) -———1 9; 
| 2( 26, =a) | n| B| 30, 00, OBS 0,) | n| 76, east 36, og f(x, 8;) 
is essentially non-negative for all 0,’s, since it is a matrix of the variances and covariances of 


56. log f(x, 6;),r = 1,2,...,p. Moreover, if for a certain set of values of 0;'s any of the principal 
Tr 


minors of the matrix | z | log f(x, 4;) a log f(x, 0,) | vanishes then the functions 


) 
3G, OB ste, 95) (r= 1,2,...,p) 
will be linearly dependent for that set of values of @;’s. We shall, however, exclude such 


exceptional cases and assume that the functions - log f(x, 6;) are linearly independent for all 


2 
sets of values of 6,’s so that the information matrix | 2 - 530) | is positive definite for 
all 0,’s. sili 


Hence | \z ( = 56-26)}, | is positive definite, and, in virtue of (13) it follows that the 
; aL\].... Dh gh gy Ue 
matrix | | -=~~— is positive definite. The matrix | |—-~— is negative definite and L 
( 06, aa). Coe aie | 
is therefore maximum at 6; = 6;. 

It also follows that a solution of the system of likelihood equations is unique. For by what 
we have just shown every solution of the likelihood equations gives a stationary maximum. 
If there were two or more distinct solutions all would be stationary maxima, and between two 
stationary maxima we should have a stationary minimum, under regularity conditions. But 
there is no solution which gives a minimum. Hence the system of likelihood equations have 
@ unique solution at which the likelihood function is maximum for every sample of any size. 
In a recent paper (Huzurbazar, 1948) I have discussed the maxima of the likelihood function 
of a sample from any distribution depending on a single parameter. 


REFERENCES 


FIsHER, R. A. (1925). Proc. Camb. Phil. Soc. 22, 700-25. 
Huzursazar, V. 8. (1948). Ann. Hugen., Lond., 14, 185-200. 
KENDALL, M. G. (1946). The Advanced Theory of Statistics, 2. London. 
Koopman, B. O. (1936). Trans. Amer. Math. Soc. 39, 399-409. 


el 
As an illustration of the calculation of the elements of the information matrix, take z( - i) . & 





f 


| a 


cr wie 


Ww 


[ 75 ] 


ON A METHOD OF TREND ELIMINATION 
By M. H. QUENOUILLE 


1. INTRODUCTION 


The problem of trend elimination is familiar in biology and in economics. Three main methods 
exist at present for the purpose of eliminating trend. First, there is the method of block 
elimination by which the data are broken into groups and the difference between groups is 
eliminated. This has the disadvantage that when the trend is marked it is only partially 
eliminated, and a second method whereby a curve is fitted to the observations may be used 
(e.g. Fisher, 1924). This method, although it allows the effect of the trend elimination to be 
estimated, can be criticized on the grounds that it is seldom possible to represent a trend by 
an algebraic curve of low degree, and the residuals will be correlated if the algebraic repre- 
sentation is inadequate and will lead in consequence to spurious correlations. In any case, 
whereas there may be little doubt about the existence of an ‘ideal’ curve, this curve may not 
be readily realized in practice. Such is often true with growth curves where a deviation at an 
early stage is reflected throughout the subsequent observations, or where the curve is in 
reality discontinuous as with the sudden pause in the rate of growth at puberty or the 
feeding-stuffs requirements of dairy cattle. This difficulty has led to the third method of trend 
elimination, namely, the method of moving averages. By this method, curves are fitted to 
sets of points and used to find the deviation for the central point of each set. Thus in effect 
itm 


(um —ag— .-. — ap)? 
t=i-m 


is minimized and a, is used instead of u;. This process is repeated to find the ‘smoothed’ 
values for the other observations, but in practice the method is formularized so that the 


m 
smoothed value for w; is obtained from a movingaverage > 5,u;,,, where 6, is independent of 
i=-—m 


i and conventionally, but not necessarily, 6, = 6_,. However, the effect of the correlation 
introduced into the residual series by the moving average is not easily ascertained and tests of 
significance are complicated. 

In the following paper, a compromise between curve-fitting and moving-average ap- 
proaches to trend elimination is suggested by which curves are fitted to portions of the series 
of observations, the observations in each portion being assumed to be equally spaced. 

The method here proposed concentrates primarily on eliminating those systematic elements 
which are generally known as trend, in order to investigate the residuals. It will not always 
provide a trend line in the sense of a smooth curve, and attention in the following is concen- 
trated particularly on trend lines which have a series of discontinuities in their first deri- 
vatives. This is not an essential of the method here described, and it is possible to ensure that 
the trend obtained is smooth, but this in my opinion can only be done at the expense of the 
residuals. If we insist upon a smooth trend line then we impose upon the independence of the 
residuals whenever this assumption is not justified in practice. From this viewpoint the dis- 
continuities in the fitted trend are far from being a disadvantage. This method is hardly new,* 
but it is believed that some of the methods adopted here make it worthy of greater application. 


* [See, for example, E. C. Rhodes (1921), T'racts for Computers, no. VI, Cambridge University Press. 
Ep.] 








76 On a method of trend elimination 


2. THE FORM OF THE TREND 


The simplest form of trend that might be taken is a series of straight lines fitted to successive 
sets of three points. This may be done by the method of least squares by fitting constants 


2b,, b, +b, 2b, batbs, 2s, bth, hy, ete., 


to represent the trend values at successive points of thé series. This might be compared with 


the analysis of randomized blocks of two, since in this case the first differences of the trend 
values are 


b, = bs, b, i bs, b, Ne Tk bs, bs ae bs, bs a b,, bs ee b,, etc. 
Similarly, the scheme obtained by fitting constants 
8b,, (s—1)6,+6,, (s—2)b,+2b,, (s—3)b,+3b,, ete., 


might be compared with randomized blocks of s. This scheme, which corresponds to the fitting 
of straight lines to consecutive sets of s points, will be said to be of separation s. 
Again, the trend obtained by fitting a series of overlapping quadratic parabolas to succes- 
sive sets of four points corresponds to the set of constants 
36,+6,, 6, +3b,, 36,+6;, 6,+3bs, 36,+6,, etc., 


and since the second differences are equal in pairs, this will be said to be of separation twe and 
degree two. ; 


Schemes of any separation and degree may be generated as follows. Suppose 
(1+é+...+é#-)@+! = a,+a,t+a,0+..., 
then Lainbj.1, Ue 15j,1, LUAjz2b;,,,  ete., 


represent a scheme of separation s and degree d. For the (d+ 1)th differences of this series 
involve the (d+ 1)th differences of the constants a. These are generated by multiplying 


© (1—¢s)H1 
-t* = as 
20 = a9 





by (1-é*, 


i.e. they are given by (1 — ¢*)**" and hence are zero within sets of s. For example, ifs = 3,d = 2 
(1+¢+é)* = 1+ 34+ 6? + 78 + 6t + 385 + £8, 
and the scheme is represented by 
b,+7b,+63, 6b,+3b,, 3b,+6b;, b.+7b,+b,,.... 


The separation, it is seen, also represents the number of observations between the introduc- 
tion of successive constants b,. 

This notation does not cover all possible schemes, since ‘hybrid’ schemes of straight lines 
and parabolas exist, but these are of little interest. Of more practical interest are non- 
symmetrical schemes, in which fresh constants suddenly appear and slowly fade away: 


2b, a 3b,, t + 4b,, 2b, + 3b,, b, + 4b,, eee 


gives an elementary example of such a scheme, but these have yet to be investigated. 
In the following section the emphasis will be on schemes of separation two, since these are 
the most efficient, but the results are applicable to schemes of all separations. 





Th 
as ste 
out e€ 
the s 


are fi 
effec 
The 


whic 


—_— OC 





M. H. QuUENOUILLE 77 


3. END-ADJUSTMENTS* 


The methods of trend generation considered in the last section have all regarded the series 
as starting at some arbitrary point, and while this is perfectly justified, it is possible to carry 
out end-corrections which enable the subsequent analysis to be simplified. For example, if 


the set of constants eh, Th, O46: ..., Bo eesty, 


n 
are fitted to the observations 2,, X2, ..., %2,,, by the method of least squares, this amounts in 
effect to the rejection of the end-observations, since these are used to determine 6, and 5,,,,. 
The same result may be obtained by fitting the constants 


2b,+a—c, 2b,, 6,+6,, ..., 2b,, 2b,+a+¢, 


n 
which minimizes 


2 = 2 
a —a—b,—6,| + a(S" —c+b—b,) 


+ (x, — 2b,)? + (x, — 6, — b,)? + ... + (xe, — 26,,)?. 
It will be assumed in the following sections that a = 0, i.e. that the slopes at either end of the 
range are equal and that z, and z,,,,, are only one-half as accurate as the other observations. 
These assumptions are unlikely to be true in practice, but it is immediately obvious that the 
effect of deviations from these assumptions will tend to cancel each other and that for many 
purposes their effect will be negligible. The advantage of such assumptions lies in the fact 
that, if x} = $(2,+2%en,1), 2}, Ve, ---, Yq, are circularly related, the least-squares equations are 
derived by minimizing 
(x, — 5, —b,,)? + (x, — 2b,)? + (x, — 5, — 6g)? + ... + (%2,, — 26,,)*, 

and the matrix of the least-squares equations is a symmetrical circulant. 

In general, end-adjustments will be made to reduce the matrix to a circulant. For example, 
for the scheme of separation two and degree two, if 


© = H(3%qn 4,42), T= Han;2+ 3xq), 


then x}, %3, 3, ..., Za, are circularly related, provided that we assume that the variance of x; 
and x, are equal to the variances of the other observations. 


4. THE INVERSION OF SYMMETRICAL CIRCULANT MATRICES 


Under the assumptions of the last section, the problem of solving the least-squares equations 
is that of inverting a symmetrical circulant matrix with a large number of zero elements. For 


example, the fitting of b,+6,, 2b, b,+bs, 2s, ete., 
necessitates the inversion of 
A=[6 1 0... 0 1] of ordern. 
i Bia 
0 1 6 
0 0 90 6 1 
aE yy eres i: 








* The following method of end-adjustments should be compared with the method given by Yates 
(1948). 











78 On a method of trend elimination 


The number of different non-zero elements not lying in the diagonals will be termed the 
extent « of the matrix. It will be readily seen that the extent of the matrices derived from 
schemes of separation s and degree d is d—[d/s], where [ ] represents the integer part, since 
this is one less than the maximum number of non-zero coefficients of the 6; in any term, which 


from §2 is 
Number of non-zero a; (d+1)(s—1)+1 
; bes ; irleinics | 


Table 1 gives some values of this function. It will be seen later that the inversion of a circulant 








Table 1. Extent of least-squares matrix 














N reo CR IO ORR TSP SE ee 
_ Separ- } | 
ee * 
| ~ 2 3 4 | 5 6 
| Degree 
1 1 l 1 l l 
2 1 2 2 2 2 
3 2 2 ae 3 3 
} 4 2 3 3 | 4 4 
| 5 3 4 4 | 4 5 
| 6 3 4 a | 5 5 | 














matrix is equivalent to solving an equation of degree equal to the extent of the matrix, and of 
inverting a matrix of order equal to the extent of the original matrix, irrespective of the order 
n of the original matrix. Fortunately, it will be seen that we are primarily interested in 
schemes of extent less than three. 

The inverse of a symmetrical circulant matrix may be written in the form 





Mo UU, Us Up Uy | 
U, Uy Uy U; U 
Up Uy Uy Uy Us 
Up Us Uy S & 
W, Me My ... & Me] 





and the uw, will be related linearly. For example, for the matrix A 
U;,+bU;,.,;+Uj4.=0 (¢ = 0,1,2,...,[4n]—2), 


with suppiementary conditions at either end of the range for i. The principal relation between 
the u, may be solved by putting and obtaining a set of 2e roots y,, 1/¥,, ¥2, 1/ys, ..., where 
1>Y,>Yo>Yz---, 80 that u; = a, yf+ fB, yz! + cyst Poys'+.... 

This may be further simplified using the end-conditions for i>[{jn]—2, which give 
B, = «yj, so that the equation for w; may be written 


c . : c , . 
Cu, = ——— Hh + -4)+-... ty yR-i)y 
U; l yz (yit+ yi ) 1 ye 2 Ye ) > 














if 


n 








M. H. QUENOUILLE 79 


where C is the first coefficient in the recurrence relation for u,; (usually unity), and c,, Cg, ... 
are constants which can be determined using the end-conditions for ¢ small. These give the 
equations 


E eyP—4z”) = 9 (O<p<e), 
j= 

=1 (p=e), 
which are independent of n, the order of the matrix. 


e 
If we now let », = lima, = > cy}, 
n—>o j=1 
then U; = v; + (v,_; + Vn+i) + (Vs,_;+ Van+i) _ oeece 
For example, for matrix A, y, = —3+2,/2 and v,; = 3 §(—3+2,/2)'. The values of v; for 


many of the more important forms of trends are given in Table 2 and also the values of 


ioe) 
Pr é = g—2d-1 
V = +2 = v,;=8 

i= 


Table 2. Values of v, 


























cocker neni a 
Separation s 2 2 2 2 
Degree d 1 2 3 4 
Extent € 1 l 2 2 

x 10-7 x 10-7 x 10-7 x 10-7 
vo 1767767 625000 228918 87346 
v; — 303301 — 208333 — 109330 — 52408 
Ve 52038 69444 49094 28329 
Us — 8928 — 23148 — 21930 — 15024 
Uy 1532 7716 9791 7938 
Us — 263 — 2572 — 4371 —4191 
Ue 45 857 1952 2212 
v, -8 — 286 — 871 — 1168 
Om 1 95 389 616 
Vy — — 32 —174 — 325 
ip — ll 78 172 
Viz —_ —4 — 35 —91 
Vo — 1 15 48 
Vis == ~ -—7 — 25 
Vig = --- 3 13 
Vis — - —1 —7 
eg — — 1 4 
ay _ _ — -2 
Vig — — — l 
V 0-1249999 0-0312498 0-0078126 0-0019530 
uy — 0-1715729 —0-3333333 | — 0-4464628 — 0-5278641 
Ye | — on — 0-0395661 — 0-1055728 
u i Bt 

















Table 2 (cont.). Values of v; 






































Separation s 3 3 4 4 
Degree d 1 2 1 2 
Extent e 1 2 1 2 
«36-? x 10-7 x 10-7 «x 10-8 
| Ue 580259 99463 255155 253225 
v; — 128115 — 40561 — 61341 — 109747 
Vg 28286 15853 14747 45027 
U3 — 6245 — 6182 — 3545 — 18401 
U4 1379 2411 852 7518 
Us — 304 — 940 — 205 — 3071 
Us 67 367 49 1255 
Vq —15 — 143 —12 — 513 
Us 3 56 3 209 
Uy «% —22 — — 86 
Vie _ 9 - 35 
vn ~ «% = —14 
Vis “ 1 * 6 
U3 — — —2 
Via — — _— 1 
V 0-0370369 0-0041155 0-0156249 0-00097659 | 
Wy — 92207890 | — 0-3899122 — 0-2404082 —0-4085474 | 
Yo = | — 0-0212657 - —0-0301991 —! 
Separation s 5 5 6 6 
Derree d ] 2 l 2 
Wxtent e€ l 2 l 2 
; x 10-8 x 10-8 x 10-8 x. 10-® 
| 
% 1333333 84757 TSU4LBRS 350881 
vy — 333335 | — 3sl62 — 19304 — 158322 
Vg 83333 j 15989 oORO4 67065 
Us — 20833 } — 6667 — 12096 — 28257 
M% 5208 i 3319 11900 
"5 — 1302 — 1158 — $47 — d011 
oP 326 483 216 2110 
vy —81 ~201 — 55 — 889 
Te 20 84 14 374 
| My —5 —35 -4 — 158 
a 1 15 ] 66 
"a _ —6 = — 28 
“4 — 3 — 12 
| Uys — -1 — —5 
Via = — —- 2 
Y15 gre be sas rl 
V 0-00800001 0-00032003 0-00462964 0-000128597 
%y — 0-2500000 — 0-4167796 — 0-2553580 — 0-42111965 
| Yo _ — 0-0346458 _ — 0-03715109 















































M. H. QuENOUILLE 


Table 2 (cont.). Values of v; 


81 






































Separation s 7 8 8 9 
Degree d 1 1 2 1 
Extent e 1 1 2 1 
x 10-8 x 10-* x10? x 19-* 
Up 494971 333126 84767 234713 
Vv; — 128020 — 86877 — 38765 — 61560 
Ve 33111 22657 16596 16146 
Vs — 8564 — 5909 — 7063 — 4235 
Uy 2215 1541 3004 1lll 
Us — 573 — 402 — 1278 — 291 
Vg 148 105 544 76 
V, —38 —27 — 231 — 20 
Vs 10 7 98 5 
Vp —3 —2 —42 -1 
V9 1 -—— 18 — 
Vy _— — —8 — 
Vie —_ — 3 — 
V3 — — —1 = 
V 0-00291545 0-00195312 0-000030517 0-00137175 
Wy — 0-2586413 — 0-2607940 — 0-42533078 — 0-2622800 
Ye —_ — — 0-03970799 — 
Separation s 10 12 16 20 
Degree d 1 1 1 1 
Extent e 1 1 1 1 
xi x 10-8 x 10-8 x 10-8 
Up 171499 99546 42122 21597 
% — 45164 — 26354 — 11211 — 5762 
Ve 11894 6977 2984 1537 
Vs — 3132 — 1847 — 794 —410 
% 825 489 211 109 
Us —217 — 129 — 56 —29 
Us 57 34 15 8 
v, —15 -9 ~~ -3 
Us + 2 1 1 
Up -1 -1 — — 
V 0-00099999 0-00057870 0-00024414 0-00012501 
Y — 0-2633479 — 0-2647455 — 02661424 — 0-2667914 























Biometrika 36 








82 On a method of trend elimination 


which will be used in subsequent applications of this theory. Thus, for example, if parabolas 
of separation two were fitted to twenty observations, two observations would be used in 
making the end-corrections, and hence d = 2, s = 2, nm = 9, e = 1, and the inverse matrix 
contains five elements, which are given by 


Uy:  0-0625000 — 2 x 0-0000032 = 0-0624936 
u, : — 0-0208333 + 0-0000095 + 0-0000011 = —0-0208227 
Uz:  0-0069444 — 0-0000286— 0-0000004 = 0-0069154 
uz : — 0-0023148 + 0-0000857 + 0-0000001 = —0-0022290 


u,: 0-0007716 —0-0002572 = 0-0005144 


Similarly, for eighteen observations, 


U,:  0-0625000 + 2 x 0-0000095 = 0-0625190 
u, : — 0-0208333 — 0-0000286 — 0-0000032 = — 0-0208651 
Uz: 0-0069444 + 0-0000857 + 0-0000011 = 0-0070312 
Uz : — 0-0023148 — 0-0002572 — 0-0000004 = —0-0025724 
Uz: 2x 0-0007716 + 2 x 0-0000001 = 0-0015434 


5. DETERMINATION OF THE FORM OF TREND 


Given a set of observations it will be necessary to decide the degree and separation of the 
appropriate form of trend. The separation which might be compared with the number of 
observations in each block of a randomized block design will depend largely upon the form of 
the observations and the tests that are subsequently to be made. Although trends of separa- 
tion two are in general the most efficient forms, if the number of observations is small the 
number of degrees of freedom may necessitate the use of a trend of higher separation. Also if 
it is desired to test or eliminate a cyclic effect then the separation may be taken equal to the 
period so that the trend and cyclic effect are orthogonal. The degree of the appropriate trend 
forms a larger problem, and it is necessary to investigate these forms further to decide upon 
the appropriate degree. However, it is always possible to fit forms of increasingly higher 
degree until no further improvement is obtained. 

If n observations are taken, then » comparisons may be made of which n—1 are inde- 
pendent. Thus x, — 2, %,—2%3, ..., Z,_1—2%,, X, —%, form n such comparisons. These can be 
combined to give an estimate 


n—-1 n-i 
= 4 (x; — 254 ;)?/n(n — 1) 
of the error variance, which under normal-law theory will be the most efficient. If, however, 
a trend is present, then comparisons such as x,—2%, = %1—2%_,+%_—2%3+2%3—<2X, will tend to 
increase the estimate of error variance so that it is necessary effectively to rule out such com- 
parisons in favour of the more precise comparisons x; — x;,,. This is done by omitting every 
sth comparison x;—2;,,, and combining the remainder in the most efficient manner, thus 


giving the randomized block analysis. For example, randomized blocks of two correspond to 
the comparisons 


X%,—X_, %—X%y, X,—Xe, etc., 
while randomized blocks of three correspond to the comparisons 


%—Xq, Xyq—Xy, X%—%s, %—X, etc, 




















whi 














M. H. QuENOUILLE 83 


which are equivalent to 


Xy—X3, %—WetXs, X%y—X%e, W%q—2X_,+%, etc. 
Similarly, the comparisons 
Hy — WetT3, Te—Weyt%, «.-, Xy—2W, +e, 
can be combined to give an estimate of error variance, but if a trend is present this may be 


improved by the omission of every sth comparison. The analysis then corresponds to that of 
a trend of degree one and separation s. Similarly, the comparisons 


X,—3%,+3%3—2%y, %—3%,+3x,-2;, etc., 
can be used to form trends of degree two, etc. 

The main consequence of this is that the appropriate degree of suitable schemes can be 
found exactly for schemes of separation two and with a high degree of accuracy for schemes of 
higher separation by the variate-difference method, and that the residual variance for 
schemes of separation can be directly estimated using the variate-difference method. If it is 
desired to test another effect simultaneously then this should be eliminated before using the 
variate differences. It should perhaps be noted that the variance-difference method is in fact 
more appropriate for this purpose than for the general problem of fitting a polynomial to a set 
of observations. Thus it will be seen from Table 8 that because of a small rounding-off error 
we should be led to suppose that a cubic curve could represent the normal curve from 


x =—3toz = 3. Similarly for the trend n, n—1, ..., 2, 1, 0, 1, 2, ..., m, the first five variate 
differences are 9 l 12 10 


> 


3(2n—1) 5(n—1) 35(2n—3) 63(n—3)° 
The ratios of these tend rapidly to unity, so that a random element would completely obscure 


any slight difference and we should be led to suppose that a straight line or possibly a parabola 
would adequately represent the data. 


6. ADVANTAGES AND DISADVANTAGES OF THE METHOD. 


This method of trend elimination would seem to be most useful for cross-correlations which 
can be evaluated by the analysis of covariance. Its application, however, to serial correlations 
is both complicated and rather dubious in nature, since successive residuals will not be inde- 
pendent. 

Its application to stochastic trends will in general not lead to any spurious results, but it 
will cause an unnecessary loss of information. For example, consider the simplest case, 
namely, the cumulative sum. Obviously the employment of first differences will eliminate 
the stochastic trend and leave uncorrelated variables, so that the use of randomized blocks 
will lead to unbiased results but will unnecessarily reject one-half of the observations. The 
use of randomized blocks of a higher separation will not reject as many observations but will 
assume that successive differences are negatively correlated. Thus, for example, if €; and €;,, 
are successive differences in randomized blocks of three, €; — €,;,, and €; + €;,, will be used as if 
they were independent with relative variances 3:1. This will in general lead to unbiased 
estimates of cross-correlations, although informatijon will again be lost as a result of the 
incorrect weighting of these comparisons 

The method might also be used to estimate the errors of systematic sampling. Yates (1948) 
has already shown that, with caution, the sum of squares corresponding to the position of the 


6-2 








84 On a method of trend elimination 


observation in each block might be used to estimate the accuracy of a systematic sample. 
For this purpose the trend elimination is obviously unnecessary. However, since the residual 
mean square will provide an upper limit for the variation between such samples it might on 
occasions be used to supplement the estimate obtained by Yates’s method. 


7. EXAMPLES OF THE METHOD 


(a) As a first example of the method, the cubic used by Kendall (1947) to demonstrate the 
variate-difference method was used. This cubic was 


&, = (t— 26) + po(t — 26)? + z45(t — 26) +4, 


where e€, was randomly chosen between 0 and 99. For this series, Kendall gave the following 
estimates of the second moments of the variate differences: 





Estimate 





1075-41 
1082-02 
1076-58 
1047-21 
1011-05 

975-20 





aOarownds- 











so that randomized blocks of two should eliminate the trend adequately. Nevertheless, to 
demonstrate the method the scheme s = 2, d = 1 has been used, i.e. straight lines have been 
fitted to successive sets of three points. This has been done in Table 3. The first column gives 
the values of z;,i = 1, ..., 51, while the second column gives the adjusted end-values in this 
case by the formula x, = }(7,+2;,). The third column gives X; = x5; ,, + 2%; ,2+%2:43 for the 
adjusted values. The elements of the inverse matrix now have tc be calculated. In this case 
n = 25, so that these may be read directly from Table 2 and applied in the form of a moving 
average to the X; to give the values of B; shown in column 4. These have been given exactly, 
although normally four decimal places would suffice, so that the application of the automatic 
checks can be seen. The total of the second column is derived from the total of the first by 
subtracting 76-5, while the total of the third is derived from this by multiplying by 4, i.e. 
by s?+!. The total of the fourth column is theoretically derived from the total of the third by 
multiplying by s~*¢-!, but owing to rounding-off errors in the v this is not exactly true, and for 
this reason the value V is given in Table 2. Thus, in this case the total of column 4 is 


6545-0 x 0-1249999 = 818-1243455 


as compared with its theoretical value of 818-1250000. The difference is small, but the exact- 
ness of the check makes it worth noting. To complete the analysis the sum of squares of the 
adjusted x,, the sum of squares for blocks 2X; B;, and the sum of squares for the position in 
blocks #5(Za,; ~ Z2q;,,)*, where x; is taken instead of x,, were all calculated to give the overall 
analysis shown in Table 4. The correction for the mean, 2-5943403, should be removed from 
the blocks sum of squares if it is desired to test the effectiveness of the method, but this is 
usually unnecessary. The agreement of the mean square with Kendall’s value is very striking 
and much better than would nermally be expected. The smallness of the square due to the 








-— 


sd 
. 


ul 





Table 3. Method of trend elimination 


M. H. QuENOUILLE 























vy x; X; B; 
— 96 76-5 
— 90 = — 120-5 — 40-36809720 
-17 ae 
— 32 — —-92 — 5-60017485 
= | ae 
— 59 “= —97 — 18-03089050 
32 — 
28 ~ 110 16-78546010 
22 = 
62 — 196 27-31782615 
50 a 
2 a 133 15-30711740 
79 —- 
—7 —- 139 13-83899325 
74 —- 
85 —- 259 40-65855725 
15 — 
—4 a 68 1-20958165 
61 — 
39 —= 140 20-08401820 
l com 
51 — 156 18-28619100 
53 - 
48 — 191 26-19874130 
42 -- 
10 — 137 15-52117180 
75 — 
37 — 161 17-67397240 
12 — 
96 -- 274 39-43476980 
70 — 
30 —— 182 19-71714810 
64 — 214 24-26195915 
34 — 
126 --- 344 48-71062335 
58 
57 - 267 27-47401355 
95 
75 403 53-44517760 
158 
99 — 454 54-85493515 
98 —_ 
159 —- 572 71-42532170 
156 - 
180 — 717 88-59298860 
201 —_— 
239 - 900 114-01636025 
221 — 
270 — 837-5 127-30858030 
249 - 
Total 3349 3272°5 6545-0. 818-12434550 

















86 On a method of trend elimination 


Table 4. Analysis of variance for trend elimination 














Degrees of Sum of Mean 
freedom squares square 
Blocks 25 448274-31 
Position in block 1 406-12 
Residual 24 25777-82 1074-08 
Total 50 474458-25 




















Table 5. Variate differences of e-* in the range 0-4 (U1) 3-0 





] 
Difference Estimate x 10!° | 





8306033-* 
25973-6 
82-0 

8-8 

9-4 


or @Nw 

















position in block is as foreseen, but it seems advisable to remove this in all analysis as a safe- 
guard against an inadequate fit. It is fairly obvios an oscillatory movement taking, say, 
values + 10 alternately would be reflected in a term of about 5000 and could thus be detected. 

(6) As a second example values of e~* were taken to four decimal places for 0-0 (0-1) 3-0. 

Theoretically, successive variances obtained by the variate-difference method will decrease 
indefinitely, but im fact errors will prevent the calculation of more differences than are of 
practical importance. Thus in this case the rounding-off errors have an expected variance of 
8-3 x 10-1, and ‘fable 5 verifies that the variance is steady at this level after the third 
difference, so that there is no advantage in fitting schemes of degree greater than three. 
A scheme of degree three was fitted in the same manner as for example (a), except that in this 
case 

a! = §(%+ 7X9), ©2 = H_g+Xyq), 2X3 = (723 +2y) 


and X 5 = Lqi44 + AXo5 49 + Girg5 45 + 495546 + Lop y5- 


The calculation of end-corrections and the form of X are not very difficult, especially for 
schemes of degree one where we have 


wf a MtOK Nene ge at (8— 2 tong py _ Bat (6-3) Hons 
ea ’ ie i a ) Domes > 
8 8 s 





etc., 


X; = Esti t 225442 + 3X 55.4 gt. FSX nit (s— 1) esa iva +... +%549) 5-1 
To simplify calculation further the corresponding coefficients for schemes of higher degree are 
given in Table 6. 
The completed analysis for the trend elimination for e-* is shown in Table 7. The residual 
mean square is small and for most purposes the trend elimination would be sufficiently geod, 




















ir 


~= 


M. H. QUENOUILLE 


87 


Table 6. Coefficients for end-corrections and the representative equation 











es @€ <@ x; x; 

2 2 1 } (1, 3), } (3, 1) 1, 3, 3, 1 

2 3 2 § (i, 7), 4 (1, 1), $ (7, 1) 1, 4, 6, 4, 1 

y 4s vs (1, 15), ve (5, 11), He (11, 5), He (15, 1) 1, 5, 10, 10, 5, 1 

3.2 2 & (1, 8), 4 (1, 2), $ (2, 1), 3 (8, 1 1, 3, 6, 7, 6, 3, 1 

4 2 $s ve (1, 15), ve (3, 13), § (3, 5), $ (5, 3), ve (13, 3), 1, 3, 6, 10, 12, 12, 10, 6, 
ve (15, 1) 3, 1 

ss 3 ws (1, 24), vs (3, 22), #s (6, 19), % (2, 3), 1, 3, 6, 10, 15, 18, 19, 
cs; 31... in Ae. 

6232 se (1, 35), ve (1, 11), $ (1, 5), ve (5, 13), we (5, 7), 1, 3, 6, 10, 15, 21, 25, 27, 
tr (7, 5), ..- > ae 

8 2 2 we (1, 63), de (3, 61), ve (3, 29), se (5, 27), ve (15, 49), 1, 3, 6, 10, 15, 21, 28, 36, 
wc (21, 43), 2s (7, 9), vs (9, 7), --. 42, 46, 48, 48, 46, ... 























For example, for the scheme s = 3, d = 2, e = 2, 
xy = 3(2,+ 82X3n41)> Ly = Het Wynye), Ty = $(2X3+73nus)> 
y= HBXq+Aenyg), Xe = Lyegy + Br ye49 + Oxrg¢43 + Tr gi44t ---- 


but this mean square is very much larger thanits theoretical value, and from this viewpoint the 
trend elimination has not been successful. In fact, the difficulty arises owing to the different 
slopes at either ends of ranges one being roughly twenty times the other. This can be demon- 
strated by analysing e~** or, in this case, the normal ordinate. Table 8 shows that the variate 
differences behave in the same manner as previously, while the analysis of variance in 
Table 9 shows a reduction in the residual mean square due to the slopes at either end of the 
range being more comparable. Table 10, which gives the residuals from the two analyses, 
further verifies that the end-adjustments are largely the cause of the residual. It must be 
noted that the sums of squares in Table 10 do not agree with the residual sums of squares 
given in Tables 7 and 9. This is due to the rounding-off errors in v; which cause a rounding-off 
error of 0-0000001 or one in 78125 in V. Usually, of course, this will be of no importance, but 
in this case where the mean square due to trend is roughly 10* times the true error mean square 
it is of greater importance. The values given in Table 10 are the more accurate. 

In general, the residuals from any fitting will indicate whether the end-corrections are 
affecting the analysis. If the end-corrections are affecting the analysis then further analysis 
will usually be necessary. The observations can be fitted without end-correc? ra form of 
partial end-correction can be used, although either method will be fairly lengthy. The calcu- 


lation can, however, be greatly shortened using the above inverse matrices. For example, the 
fitting of 


2b,, 6,+6,, 2b, ; i 
involves the inversion of 
| & 0 0 0 7 
6 0 
ee ea 0 














88 


We may muitiply this matrix by 








On a method of trend elimination 

































































6 1 0 1 ] -1 
6 0 0 
0 6 0 
000 0 a 
[i oy @ 6 | 
Table 7. Trend elimination analysis for e-* 
x 2, X; B; 
1-0000 0-i782 40510 0-0114487107 
0-9048 0-4799 
0-8147 0-7191 10-0459 0- 1050190925 
0-7408 — 
QO 2793 — 10-6789 0-0820532406 
O85 — 
05488 - 8-8248 0-0688415858 
69-4566 rp 
4493 —- 7-2253 0-0556250230 
0-4066 —- 
0-3679 —- _ 59159 0-0458750583 
0-3329 at 
0-3012 — 4-8433 0-0374148344 
0-2725 — 
0-2466 -—— 3-9651 0-0306795883 
0-2231 _— 
0-2019 —— 3°2465 0-0251287380 
0-1827 — 
0-1653 — 2-6582 0-0205073968 
0-1496 — 
0-1353 — 2-1763 0-0169669999 
01225 — 
0-1108 — 1-7820 i 0-0134806760 
0-1003 -_ 
0-0907 — 1-458 ; 0-0119720304 
0-0821 as 
0-0743 — 1-31:9 0-0076813439 
, 0-0672 — 
; 0-60 awe 
94550 — 
L298 an 
itis an 
Total 8-5230 | 68-1840 | 0-5326943186 
| Degrees of Sum of | Mean 
| freedom squares | square 
| | 
Bass 14 3-7858845 
| Postion in block 1 0-0000005 
Reside | 13 0-0004302 0-0000331 
| Total 28 3-7863152 | 








= 




















Table 8. Variate differences of the normal ordinate in the range 0-0 (0-1) 3-0 








| 
Difference Estimate variance x 10" 
OS a 
| 
| 1 1173518-3 
| 2 5630-5 
i a 59-3 
4 11-0 
5 10-8 











Table 9. Analysis of variance for trend elimination 











Degrees of | Sum of | Mean 
freedom squares | square 
| 
Blocks 14 1-1853296 
Position in block 1 0-0000027 
Residual 13 0-0000527 0-0000041 
Total 28 1-1853850 




















Table 10. Residuals from the fitting of e~* and the normal ordinate 

















e-* Normal ordinate 
— 0-00319 — 0-00297 
0-01403 0-00597 
— 0-00452 — 0-00284 
— 0-00749 — 0-00120 
0-00412 0-00093 
0-00292 0-00062 
— 0-00193 — 0-00044 
— 0-00127 — Y-00023 
0-00083 0-00017 
0-00060 0-00011 
— 0-00039 — 0-00013 
— 0-00026 — 000004 
0-00016 0-00002 
0-00012 0-00004 
— 0-00002 — 0: 00003 
—0-J0013 —0-00001 
— 0-00006 0-00001 
0-00016 — 0-00003 
0-00016 ' 0-00005 
— 0-00030 — 000006 
— 0-00049 — 000004 
0-0007 1 0-00001 
0-00098 0-00028 
—0-00151 — 0-00037 
— 0-00229 — 0-00032 
0-00349 — 0-00054 
0-00479 - 0-00086 
— 0-00932 — 0-00 102 
Total — 0-00010 — 0-00012 
Sum of squares 0-000447072 0-000057924 
Sum of squares for 
position in block 0:-000446609 0-000055171 





























90 On a method of trend elimination 


giving : 
Uo Uy Us Usen-2 Uen-1 
U, 1 0 Mies 
Uy 0 1 0 Usps 
3 
 Ugn_2 0 0 1 Uy 
[. Uen-1  Uen-2 Uen-3 -:- Uy Uy J 





which is more easily inverted. However, the simplest, though not the most efficient, method 
would seem to be the use of an analysis of covariance with dummy variates (Bartlett, 1937) 
to remove the effect of the end-terms. This would involve a covariance analysis on as many 
dummy variates as there are end-corrections, and will obviously be most useful for the 
simpler forms of trend, i.e. for trends of lower separation. Thus if we wish to correct a cross- 
correlation between series, a covariance must be used eliminating dummy variates repre- 
senting the end-corrections. 

However, the covariance analysis may be partitioned into components representing 
adjustments for the differences in the first, second, ..., differentials at the beginning and at 
the end of the observations. Thus, for example, in this case if th : slopes at either end of the 
series are b—c and b +c, the observations may be taken as 


a,—2b+3c, a,—2b+2c, a,—b+e, a,, ..., Gg, Ggtb+e, ag+2b4+2c, a,+3b+4 3c, 


and the adjusted observations are 
h(a, + 7a,+4b+10c), }(4a,+4a,+16c), 3(7a,+a,—4b+10c), ay, 
Thus a covariance analysison 5, 8, 5, 0, 0, 


will remove the portion on the residual variance which is due to the difference in slope at 
eitner end of the range. Similarly, a covariance on 


., &. =%, & & 


will remove the portion which is due to the difference in second differentials at either end of the 
range, etc. The great advantage of this device is that when we are dealing with forms with a 
large number of end-corrections, instead of carrying out a covariance analysis on this 
number of dummy variates, if necessary it will usually be sufficient to carry out one, or 
possibly two, analyses on 


ix 2S, Oxi, 2x6, x6, Ox, eF xB, Bxt, & @ 

i.e. 9, 16, 21, 24, 25, 24, 21, 16, 9, 0, @, 
shovild suffice. 

In the above examples joint covariances on the three adjusted end-observations remove 
0 000446223 in the e-* analysis and 0-000055030 in the normal ordinate analysis. Thus within 
the limits of accuracy imposed by the rounding-off errors in v;, these covariance analyses 
account for the residual variances given in Table 10. These further analyses lead to corrections 
in the adjusted values of — 0-0218, — 0-0423 and — 0-0172 for e-* and — 0-0008, — 0-0102 and 


— 0-0003 for the normal ordinate. In the latter case a covariance for the second adjusted 
observation eliminates nearly all (0-000054972) of the residual variance. If, however, co- 


AR ge 





varia 
are fc 
resid) 


It ha 
nomi 
calrri 
larit; 
obse! 
be ac 


A A 








M. H. QUENOUILLE 91 


variance analyses are carried out to remove the portions due to the differences in slope, these 


are found to account for 0-000406947 and 0-000027532, which are the major portions of the 
residual variation. 


8. SUMMARY 


It has been suggested that trend might be eliminated using a series of consecutive poly- 
nomials of the same degree. A method has been given whereby the calculation can be rapidly 
carried out provided that the observations are circularly related. Ways of inducing circu- 
larity have been given assuming that the differential coefficients at either end of the range of 


observations are equal, and it has been shown how any deviations from this assumption can 
be adjusted using a covariance analysis. 


REFERENCES 
BaRTLeEtTT, M. S. (1937). Some examples of statistical methods of research in agriculture and applied 
biology. J.R. Statist. Soc. Suppl. 4, 137-83. 


FisHEr, R. A. (1924). The influence of rainfall on the yield of wheat at Rothamsted. Philos. Trans. B, 
213, 89-142. 


Kenpatt, M. G. (1947). The Advanced Theory of Statistics, 2. Griffin and Co. 
Yates, F. (1948). Systematic sampling. Philos. Trans. A, 241, 345-77. 








[ 92 ] 


ON THE ESTIMATION OF DISPERSION BY LINEAR 
SYSTEMATIC STATISTICS 


By H. J. GODWIN, University College of Swansea 


1. INTRODUCTION 


The purpose of this paper is to discuss the efficiency of estimates of the standard deviation of 
a population which are obtained by ranking the observations of a sample and taking a linear 
combination of them. Suci statistics are termed systematic by Mosteller (1946). We shall 
eonsider only the case in which the same rule of combination is used for every sample of a 
given size; thus the mean deviation from the mean, which takes different forms according to 
the position of the sample mean relative to the observations, will be excluded froin the 
following theory. 

The best-known statistic satisfying the requirements is the range whose probabilitiy in- 
tegral in samples from a normal population was tabulated by Pearson (1942) and Hartley 
(1942). Earlier, Tippett (1925) had tabulated its mean value for sample sizes 2~1000, and 
given formulae for higher moments, while E. S. Pearson (1926) had calculated second, third 
and fourth moments for several sample sizes. Othér measures which are the difference of two 
observations have been proposed, such as th: saterquartile range, discussed by Hojo (1931), 
and the difference of quindeciles, suggested b-» K. Pearson (1920). Mosteller (1946) discusses 
these and other differences of symmetricaily placed ranks which he calls quasi-ranges. 
Recently, Nair (1947) has considered the mean deviation from the median. He propounds 
several questions as to tl.e usefulness of this statistic, and the present paper has grown from 
an «ttempt to answer these. [ had obtained the frequency function of this statistic in normal 
samples (as Nair (1948) observed, after he had obtained it as a special case of a more 
general result), but finding it intractable for further development did not attempt to 
publish it. 

The method used here is to consider the first and second moments of differences of con- 
secutive ranks. These were studied by Irwin (1925) and Pearson & Pearson (1931), the last 
authors deriving results about the differences from a knowledge of the moments anid coxrela- 
tions of the ranks themselves. Recently, Hastings, Mosteller, Tukey & Winsor (1947) have 
also tabulated the means, variances and covariances of ranks, and either of their results or 
my results on rank differences given below is deducible from the other. However, my results 
are computed to more places than theirs, and I have thus been able to give a more accurate 
version of their table (Godwin (1949)). Some of the results which they computed have 
been given exactly by Jones (1948), who has evaluated certain integrals connected with the 
normal distribution. I have also extended this table in the paper referred to above. 


2. FIRST AND SECOND MOMENTS OF THE DIFFERENCES BETWEEN RANKS 
Let the population studied have a distribu‘ion function F(x), with a frequency function 
f(x) = (d/dx) F(x). We require the limits 


lim F(x) and lim a(1— F(2)) 


7-—@ mo 








The 


The 
The 





H. J. Gopwin 93 


to exist and to be equal to zero. This is so if the second moment of f(z) is finite, but this is not 
a necessary condition, as illustrated by the case 


1+A 


an eee < 
fl) = syyaye (<4<)- 
Let x,, 22, ..., 2, be a sample of n from this population, such that x, <2, <...<2,. 
Let Yi = 2, —-%, (1 =1,2,...,n—1). 


Any linear function of the z’s can be expressed as a linear function of the y’s added to a 
multiple of x,. If the {anction is to measure dispersion we require it to be zero when all the z’s 
are equal and all the y’s thus zero. Hence the term in 2, is zero and the statistic is a function 
of the y’s only. Thus to find its first and second moments we need E(y;) and E(y;y;) for ail 
i, j, E denoting expectation. 

If in the statistic the coefficients of y;, y,,_; are the same (¢ = 1, 2, ...,[}]), we shall call it 
symmetrical. Its value is then unaltered if the sample (2,, 2, ...,Z,) is replaced by 

(k-—2x,,k—2,_4,---,k—2), 
where & is any number. 

It has been shown by Irwin (1925) that 


By) ="C,{ Fe) (1 Fley-tae, 


an expression which, in a sample of n, Irwin denotes by y,, ;. If the distribution is symmetrical, 
E(y;) = E(y,-;) = 3(E(y;) + E(y,_,)), while if the statistic is symmetrical y; and y,,_; always 
occur added together. In both these cases the quantities we need can be expressed in terms 
of the integrals 


Wi) =[" Fe Fede, (1) 


. 


To obtain the relation we must express /"-*' + (1 — F)"-* in powers of F(1—F). Put 
; 2F-l=u, F(I-F)=v= }1-w?). 
Then Fr+(1-—Fy = (}(14+u))"+(401-w)y 
= Qi-r S rC,, u* 


s=0 
= 21 > "C,,(1 — 40)°. 


(ar) 
The coefficient of v* in this is (-—1p2-"* > CC.. 
t=s 
The sum is the coefficient of x-** in 
(1+a-")"(1-2*)-*-! = coefficient of 2*-*5 in (1 + 2)? (1-2) 
= Sng rst, = L{r-s), 7-28, — 1-8-1) r-8-1)_} 
t=0 
a 2r—-te—-142 80, — ~o-at7). 
[Hn—2i)} 7 ' 
Therefore Ety) =4"C; SY (-1)y(s+i){2"-*3C, -—*-2-s-1¢}. (2) 
s=0 
Various identities can be obtained by means of this expression; for example, the mean 
ranges for odd sample sizes can be found from the mean ranges for even sample sizes. We can 











94 On the estimation of dispersion by linear systematic statistics 


aiso obtain the identity, suggested by ‘Student’ and proved by E. S. Pearson (1926), con- 
necting E(y,;) with mean ranges, viz. 
E(y;) = °C, — 1) 1A*w,_;. 
This is not, however, useful for computation, owing to the large number of additions and 
subtractions involved. Starting, as Pearson did, with the mean range to five places of deci- 


mals, we obtain E(y;) to only two places when » = 10. 
Irwin also showed that 


By) = 2°¢{" Pie,| 1 - Fetdyde 3) 


We put y(a,b) = :. Fx} I" [i — F(y)]’dydz; (4) 


for a frequency function symmetrical about the origin, (a,b) = y(b, a). 

To find E(y,;y;) we must first find the joint frequency function of y;, y;. 

If j >i+1 this is 

n! as ' . 
Ga ad ey ROM OMNI +) 
x Fa) (Fa) — Fle +y)- 2 (1 — Fe; + ys)" deyde,, (8) 

When j = 1+ 1 we put x; = x, +y,, and omit the terms f(z,) [F(x,;) — F(x;,+y,)}'*. Although 
this case needs special consideration initially, the final formula covers all cases. We now 
multiply (5) by y,y; and integrate over these variables from 0 to 00. If we integrate by parts 
with respect to y;, put y; = w—2, and integrate by parts with respect to 2,, we get 


n! i x ™ 
. a eee f(x) f(x, .) Ft-l(x, 
yrgat—wTtl elev HOt HO) 7 
x [F(a;)— F(a, +y)) ** (1 — Fla)? dy, da, dz, 
which is the same expression as arises in the case j = i+1. Proceeding further we put 


x; +Yy; = u, integrate by parts with respect to x;, change the order of ‘ntegration of wu and x; 
and integrate by parts with respect to u, getting 


n! P ag . i . 
eet a i(x. it 5.) P-*[ 1 — F(x) ]"- AX. 
iGaaiwcm cof Fa) ae Tee ee 
On expanding [F(2x;)— F(x,)P* = (1-(1— F(s,;))-— F(x) y* 
by the multinomial theorem, we have 
j—i j-i-r 
E(y,y;) = "Cj;_," IHC, 3 2 (— Lyte li tr, n—j +s). (6) 
r=0 s=0 


The mean and variance of any function > a;y; can now be found from (2), (3) and (6). 
i 


3. THE MOST EFFICIENT LINEAR SYSTEMATIC STATISTIC 


We now show that it is possible to find «; such that 
n—1 
d= > ay; 
i=1 
is more efficient than any other linear statistic for sample size n. The efficiency of « is in- 
versely proportional to (vard)/(H(d))?, which we denote by W(d). Writing for brevity 


E(y;) = e;, E(y;y;) = C5, 











an 


a- 





H. J. Gopwin 95 

we require (LLe;;a,;a;)/(Ze;,a,)* to be a minimum. It is convenient to minimize 
R = log (ZXe,;a,a;) — 2 log (Ze; a,). 
We have od = 2(De,;«,;)/(ZLe,;0,0;) — 2e;/(Le,a,). (7) 
é ae 
The (x — 1) equations 0R/da; = 0 may be written 
Lea; => Ae;, 
The determinant of coefficients on the left-hand side is the discriminant of the quadratic form 
LLe,;;«;a;, and is non-zero since the form is positive definite, being E(Za,y;)*. Hence the a’s 
may be solved for in terms of A, and since only their ratios are required, this is sufficient. It is 
found that if the population is symmetrical, then so is the statistic. 
It remains to show that these values give a minimum value of R; 





PR  — ej, ~—=_=—A( Veja) (Vezjpa;) | ege; 
Oa,0a; LLe;a,0;,  (ULe;a,a,)*  (Le,a,)? 
7 2¢€;; 
= SEeyaa,’ from (7). 


eR ; a” : . 
Hence, by the argument used above, 22 oy da,da; is a positive definite quadratic form 
and R has a minimum. Fhe 


The most efficient linear statistic can thus be determined, given the population. The 
converse problem, of finding the populations for which a given statistic is most efficient, seems 
more difficult. A unique solution is not to be expected, since the range is best for both a 
rectangular population and the binomial population with frequencies 4, }. It seems likely 
that the ratios of the «’s can only lie in certain ranges of values; on intuitive grounds one 
would expect a,/a,, a,/a,, etc., to be large if the distribution is leptokurtic, so that less 
reliance can be placed on the tails, and this is confirmed by a few special distributions for 
which I have worked out the best estimate for a sample of four. However, it does not seem 
that the correlation between a,/a, and /, is exact. 


4. RECTANGULAR POPULATION 
For this population all the integrals which occur can be evaluated explicitly. We have 
Wi) = WNEW+Y!, WHF) = 46) GIG +5+2)! 
(taking f(z) = 4, —1<2<1), whence, in a sample of n, 
E(y;) = 2/(n+1), Ely?) = 8/(n+1)(m+2), E(ycy;) = 4/(n+1)(n+2). 
The most efficient linear statistic is the range, and 
W(w) = 2/(m—1) (n+ 2). 
The mean deviation from the median (m’) is defined by 
2vm! = (y¥, + Yo,-1) + 2(Yet+ Yaa) +--+ vy, ifn = 2», 

and by (20-41) m' = (y+ Ya) +2(Y2+ Yaa) +--+ YY, +Yua) ifm = w+. 

This gives E(m’) = v/(2v+ 1) in samples of 2v and 2v + 1, while 

E(m’*) = (3v3 + 2v? + 1)/3v(2v+ 1)(2v+2) in a sample of 2 








BR eS 


96 On the estimation of dispersion by linear systematic statistics 
and E(m'*) = 2v(3v? + 5y + 1)/3(2v+1)?(2v+3) in a sample of 2v+ 1. 
Hence W(m’) = (v?+v+1)/6y* in a sample of 2v 

and W(m’) = (v+2)/3v(2v+3) in a sample of 2v+1. 


Hence the efficiency of the mean deviation from the median reiative to that of the range 
decreases to zero as the sample size increases to infinity. 

The equality of mean values of m’ in samples of 2v and 2y+ 1 which occurs above is not 2 
coincidence, but is true for all populations. The mean value of m’ in a sample of 2 is 


=| ("2 = APs +e Fy) +vC, F(1— ry| dex. 
Multiplication of the integrand by F +(1— F) gives 

> ie : P= FP cP t - FFP + 6-1 PS, -)\ az 

== if” i Sr1C (Fl — FP) 4+ Pet — Fy)}s, 





which is the mean value in a sample of 2+ 1. 


5. NORMAL POPULATION 


With this population most of the integrals required cannot be evaluated, as far as one can 
tell, in terms of known functions, and some account will first be given of the calculations 
performed to give the results which follow. 


To evaluate the y(t) (¢ = 1, ..., 5), 
~ (2a i 


was obtained from British Association Tables (1931) and F(x) (1 — F(x))* tabulated for x at 
intervals of 0-1. These were integrated from 0 to the value beyond which the integrand was 
less than 10-!° by Simpson’s rule, and checked by the ‘three-eighths’ rule. 

On integrating by parts we have 


y(a,1)= ai-|- x F(x) (1 — F(x)] dx. 


The integrals [° x[2F(x)—1] F(x) [1— F(x) dx (r=1,...,4) 


were computed as above and the required integrals obtained by algebraic combination. For 
y(t,7) (¢,9 > 2), . (1— F(y))’ dy was computed, using Cotes’s formulae up to that founded on 


an octic curve where necessary, and these functions were multiplied by F*(x) and intezrate4 
again by Simpson’s rule. To check the working, use was made of the identity 


i j 
Hi) WG) =2 DD (— IMO iO, Wi +-u,j +0) 


which is obtained by writing y(j) as 


({ od +{") (Fiy) (1 — F(y)} dy), 





expan 
tegrat 
allowi 


likeli 
of sq 


Thu: 








H. J. Gopwin 97 


expanding (1 — F(x))‘, (1 —(1— F(y)))/ by the binomial theorem and changing the order of in- 
tegration where necessary. This involved every value except y/(1, 9), (3, 7) and (5, 5), and, 
allowing for the accumulation of rounding-off errors in the use of the identity, agreement was 
such as to give confidence in the last figure being not more than one or two units out, and that 
only for the larger values of i, j- 

A further check was attempted by calculating from these integrals some other integrals 
calculated by Hojo (1931), one set being expressible as linear combinations of the others. (In 
his notation the integrals involved are Tp, ..., T,, Ry, ..., Rg, aby, ---s els, gfe and 4J,). This did not, 
however, help very much, as his values are to eight decimal places only; also they differ from 
mine in the eighth place by one or two units in several cases, and where direct evaluation is 
possible my value proves to be the correct one. 

The values of the integrals y/(7) and (i,j) are given in Table 1; E(y;) and E(y,y;) for sample 
sizes 2 to 10, calculated from formulae (2), (3) and (6), are given in Table 2; and the coeffi- 
cients for d, the most efficient unbiased linear estimate of standard deviation, are given in 
Table 3. Values which are algebraically identical (e.g. E(y;) and E(y,,__;)) are given once only. 
In Table 4 are given the efficiencies of a number of statistics relative to that of the maximum 
likelihood estimate. For the latter. I have taken the unbiased estimate derived from the sum 
of squares about the sample mean, viz. 


L(d(n— 1)) , 
J2T(n) * 
Thus, for example, the efficiency of d is 
(d)? {, “ey 
vard ("DI T'(4n) ae 


Standard errors of the maximum likelihood estimate, the range estimate and the mean 
deviation estimate, denoted by ¢@,,, 7, and o,, respectively, were compared by Davies & 
Pearson (1934, see top row of their Table [). 


[2(2—x)*}]. 


Table 1. Integrals ys(i) and y(t, j) of power products of normal tail areas 
(equations (1) and (4)) 


























i ae eens eee —— ae —_-—-—_} 
. | 
v 1 2 3 4 5 
aera —_ = Se ee ee sac re ee ae es ee 
| 
yr (zt) 0-56418 95835 | 0-09900 37941 | 0-02015 46834 0-00435 75543 | 0-00097 35536 
ih Ris RR: ERT Da 
j W(i, J) 
Bien ‘ ST = ee ees "A = 
| ‘ 
1 | 0-50000 00000 ine 
2 | 0-19550 11094 0-05015 71621 oe! 
3 | «=6«0-11216 77761 0-02132 62754 0-00720 90995 | | 
| 4 | 0-07565 42297 0-01138 67208 | 0-00319 21562 | 0-00120 76002 | 
5 0-05580 73500 0-00693 43673 | 0-00165 12449 0-00054 77852 0-00022 04352 
6 | 0-04357 46098 0-00460 22187 | 0-00095 98747 0-00028 10310 
7 0-03538 45805 0-00324 55847) | -0-00059 95945 
8 0-02956 98893 )-G0239 45873 
&. 0-02525 68141 











Biometrika 36 














i. 
. 
























































a © ‘3 2 / D bed J >| 'N aa © = = oe Zz 2 
£5 ¢ ¥ Ae > $3 3288 
nae =_ ee — —— ~~ — a 
| | ; 
8190-0 | (°°) 
9€8¢0-0 80L0-0 (SA fi) 7 
SF9L0-0 (‘A°A) 
1#890-0 OEEROW | (°A®h) aq 
9¢¢90-0 FLLLO-O Z &E160-0 | (*A*h)g 
L¢990-0 0F9L0-0 L 80160-0 ZO LEETT-O | (*A*Ai)g 
GFLIT-O | (8A *h) 
186600 | 9fLZ1-0 (*A*h) gy 
90¢80-0 yePOL-0 8 ZE0FT-O | (*R*h) gq 
FLI80-0 ZFEL60-0 1 SZLit-0 6 LOBST-O (*A*R) 7 
0€880-0 LIPBO-O € GOOLE 68 PSSEL-O IE 681-0 | (*R8h) 7 
12060-0 86660-0 8 FOET?-O LO Z83E1-0 €L Z8E9T-0 FES E1SZS-0 | (Ri) gq 
189820 | (fi) a 
9981-0 16Z08-0 | | (SAT) g 
cestl-o | Z906(-0 © 868660 | (*A'A)g 
SSEET-O OL 191-0 F IZ¢Is-0 8¢ FR8FE-O | (AM) 
F98ZI-O 6O8F1-0 I €L8L1-0 ZE YELETZ-O 00 L9ES8E-O (Shh) 
9SIEI-0 FLOFI-O © FGROT-0 LE ¥6Z0Z-0 6E £99970 BSL ESEEF-O (RA) 
ZLEFI-O EL9ET-0 L ZEFLT-O 89 99661-0 #6 6L6EZ-0 L98 ZSFIE-0 99ZE FLIIC-0 (SA i\g 
99ZLT-O 86F81-0 L 16002-0 LI &2332-0 SF O8ECZ-0 ZEE FLEOE-O PCOFP TI86E-0 L899 86E¢9-0 (*A'i)s 
OILIT-0 (JA)a 
EO8TI-0 FOSET-O F 90691-0 (ff \q 
O06EFT-0 9Z19T-0 ¢ E88T-0 60 SFEZS-0 OF 9E88z-0 (Ag 
1¢e1Z-0 89EEZ-0 F ZLLEZ-0 FE FZ166-0 91 O9IFE-0 1G¢ ZE9ZF-0 St6S 88109-0 (fA) 5 
FISOS-0 9ZZES-0 € S199¢-0 CF FO0T9-0 Z8 89699-0 O&Z FS9EL-O 60ZZ FEL68-0 9699 OOELI-T 00000 00000-2 (fh)5 
PECSEZT-O (SA) aq 
OTESZ-0 ECFLZ-0 6 Z0C0E-0 ("A)q 
6Z082-0 FPL6Z-0 8 0€0ZE-0 OL OLZEE-0 LE GOEOF-0 (*)q 
O£CFE-O SLE -O Z OFGLE-O EL 99FOF-O Z8 OZOFF-0 L68 10S6F-0 COLZ ZOFGS-0 (A\q 
OFLES-O SLEGL-O G LEILS-O IF O8F6S-0 Si ChSZ9-0 OLS F6L99-0 i6GE YESEL-O CLEP 8Z9F8-0 OLOT6 LE8ZI-T (‘Aq | 
~~ 1 ' - too J - 
! | 
| SE ES, 8 L | 9 ¢ F € | z odie 
pA Sa Ge) RE ree _ | 2 











sapduioe pousce ur saouasaffip yuns fo spuamou yonposd pun syuamopy *Z 2148], 








— 


H. J. GopwIn 99 


As far as extrapolation is possible from such small sample sizes, it seems that the range is 
the best statistic in current use only for sample sizes up to 6 (offsetting ease of computation 
against a slight loss of efficiency for n = 6). Further, that the mean deviation from the median 


Table 3. Coefficients, a;, in the most efficient unbiased linear estimate of standard deviation 
in samples from a normal population 





































PSs 
Sample "i Ye Ys Ya Ys 
size, n 
aa 
2 0-88622 69255 
3 0-59081 79502 
4 045394 0395 0-56412 1139 
5 0-37238 157 0-50759 551 
6 0-31752 48 0-45608 45 0-49929 61 
7 0-27781 06 0-41290 78 0-47537 14 
8 0-24758 6 0-37703 4 0-44834 3 0-471300 
9 0-22373 0-34700 0-42210 0-45807 
10 0-20438 0-32158 0-39784 0-44142 0-45564 
n-1 


Note. The estimate is d= ‘3 Qi, Where a; = Ay-;. 
t- 


Table 4. Percentage efficiencies of various estimates of o in a normal population 






































Statis- 
tic d ; ; ’ , 
‘Sample mm me u Tai v2 Tn-2 v3 a n-3— %% Tnq— Us 
size Ne 

2* 100-00 100-00 100-00 100-00 

3* 99-19 99-19 99-19 99-19 

4 98-92 96-39 91-25 97-52 25-24 
| 5 98-84 94-60 93-84 95-48 39-97 
6 98-83 93-39 90-25 93-30 49-55 13-49 
7 98-86 92-54 91-78 91-12 56-14 23-88 
8 98-90 91-90 89-76 89-00 60-85 32-05 9-03 
9 98-9 91-4 90-7 86-9 64-3 38-6 16-7 
10 99-0 91-0 89-4 85-0 66-8 43-8 23-3 6-7 
| 








Notation: d denotes best linear estimate; mm denotes mean deviation from mean; 
m’ denotes mean deviation from median; w denotes range. 


* For n=2 the maximuwn likelihood estimate and all the linear ones are multiples of y,. For n=3 all 
the linear estimates have similar distributions. Consequently in these two cases the efficiencies are 
identical. 


is less efficient than the mean deviation from the mean, but that the ratio of s he efficiencies is 
not less than 0-945 (which occurs when n = 4). The figures for the efficiency of d show that it is 
possible to choose from among linear estimates a statistic much more efficient than any 
now in use. The possibilities of quasi-ranges remain uncertain, and they might repay 


7-2 








100 On the estimat?on of dispersion by linear systematic statistics 


investigation for larger sample sizes. If the above methods are used, however, it will be 
nevessary to compute y(t) and y(i,j) to more places, as the chief loss of accuracy is from 
tise large number of additions and subtractions of multiples of these that are involved. 


{i am grateful to Prof. Pearson for suggesting a number of improvements to the original 
draft of this paper. 


REFERENCES 


BritTIsH ASSOCIATION (1931). Mathematical Tables, 1. 

Daviess, O. L. & Prarson, E. 8. (1934). J.R. Statist. Soc. Suppl. 1, 76-93. 
Gopwin, H. J. (1949). Ann. Math. Statist. (at press). 

Hart ey, H. O. (1942). Biometrika, 32, 309-10. 

Hastinos, C., MostELLER, F., TuKEy, J. W. & Winsor, C. P. (1947). Ann. Math. Statist. 18, 413-26. 
Hoso, T. (1931). Biometrika, 23, 315-60. 

Irwin, J. O. (1925). Biometrika, 17, 100-28. 

JonES, H. L. (1948). Ann. Math. Statist. 19, 270-3. 

MosTELLER, F. (1946). Ann. Math. Statist. 17, 377-408. 

Narr, K. R. (1947). Biometrika, 34, 360-2. 

Narr, K. R. (1948). Biometrika, 35, 118-44. 

Pearson, E. S. (1926). Biometrika, 18, 173-94. 

Pearson, E. S. (1942). Biometrika, 32, 301-8. 

Pearson, K. (1920). Biometrika, 13, 113-32. 

Pearson, K. & Pearson, M. V. (1931). Biometrika, 23, 364-97. 

Trprett, L. H. C. (1925). Biometrika, 17, 364-87. 


Vv 














be 
om 


1al 


SET US 


Ve 











[ 101 ] 


ON THE RECONCILIATION OF THEORIES OF PROBABILITY 


By M. G. KENDALL 


INTRODUCTION 


1. Few branches of scientific method have been subject to so much difference of opinion 
as the theory of probability. Even when we put aside numerous points of taste in presentation 
or axiomatization there remains a stubborn residual variance of viewpoint between different 
authorities. Everyone agrees that this is undesirable; nobody yet, I think, has dared to 
maintain that it is avoidable. But that is the theme of the present article. I wish to show that 
there is nothing necessarily incompatible in the varying views which are currently held; 
that the authorities are either saying the same thing in different ways or can only disagree 
because of avoidable latent differences in their premisses or their field of discussion. History, 
both past and present, teaches us that the role of mediator between contestants is neither safe 
nor profitable, and I am quite prepared to incur general disfavour for maintaining that most 
authors have some of the right but that none has a monopoly of it. In plunging into a con- 
troversial subject one must run the risk of controversy. I can only say that I have tried not 
to excite it.* 

2. It is convenient to consider the subject under three heads, Foundations, Direct Theory 
and Inverse Theory. The greatest differences of opinion (or, at least, those which have gener- 
ated the most heat) have arisen in the third, but there are substantial differences concerning 
the first. Even the second, as I shall point out, is not free from problems of a character 
similar to chose arising in the other two, though the fact is not normally given much pro- 
minence and perhaps has not been generally appreciated. 


FREQUENCY AND NON-FREQUENCY THEORIES 

3. Although there are many shades of opinion about the appropriate foundations of a 
theory of probability we may broadly distinguish two main attitudes. One takes probability 
as ‘a degree of rational belief’, or some similar idea, and in enunciating the foundations of 
the subject does not attempt to analyse it into simpler ideas; it is therefore necessary to 
agree upon certain axioms and postulates concerning probability itself before a definite 
theory can be founded. The second defines probability in terms of frequencies of occurrence 
of events, or by relative proportions in ‘populations’ or ‘collectives’; and attempts to base 
the theory on familiar concepts without invoking a special indefinable ‘probability’. In 
practice both approaches seem to lead to the same kind of direct theory, and it would be 
rare for exponents of the two to reach different mathematical conclusions from the same 
premisses. But this pragmatic reconciliation, though perhaps comforting to the onlooker, 
is not enough. The difference of viewpoint runs down to the heart of the subject and must be 
resolved if possible, not merely for intellectual comfort, but because it affects the theory of 
inference in which different authorities sometimes do reach different conclusions, 

* The first personal pronoun occurs more frequently in this article than good taste in objective 


scientific writing would normally permit. The reason is that I wish to leave no doubt about which state- 
ments are advanced as personal views. 











102 Reconciliation of theories of probability 


4. The first point to establish is that the current methods of axiomatizing the calculus 

of probabilities are not sufficient to axiomatize a theory of probability. The pure mathematics 
* of a calculus of probabilities proceeds from certain basic rules (such as those governing the 
addition or multiplication of probabilities of independent events) which are usually laid down 
without much discussion of their rationale. The simplest approach is exemplified by the 
procedure adopted in most text-books of algebra, wherein the probability of success of an 
event which can happen favourably in m out of n possible ways is defined as the ratio m/n. 
A more sophisticated approach is used by Kolmogoroff (1933) and Cramér (1945), for 


example, by relating probability to the measure of a set of points. This is probably adequate | 


for the mathematician, who is more concerned with working out the logical consequences of 
his assumptions than with relating this calculus to the physical world. It fails, however, to 
solve the problem with which we are here concerned, namely, to found a theory of probability 
as a branch of scientific method. In other branches of science it may be enough to set up a 
mathematical model and to accept it if it gives a reasonably good account of observation; 
but in the most general sense we are here considering how good a ‘reasonably good’ agreement 
must be. 

5. It has sometimes been suggested that the domain in which the statistician is interested 
is only part of the whole domain covered by the phrase ‘uncertain inference’; that, for ex- 
ample, other people may reasonably be concerned with gauging the degree of doubt in 
propositions concerning unrepeatable events: such as ‘Homer was blind’, whereas the 
statistician operates with series of similar propositions or events because his science is the 
study of aggregates. The suggestion is, I think, that the statistician can avoid some of the 
difficulties of uncertain inference by confining himself to classes of propositions concerned 
with experimentally repeatable events. It might even be questioned whether ‘probability’ 
as the statistician uses the word has the same connotation as when it is used by logicians or 
scientists in relation to individual hypotheses. Perhaps the psychologist can answer this 
question for us. My own opinion is thav there is no essential difference (other than that of 
intensity) between the attitudes of doubt towards any propositions of whose truth we are 
uncertain. Something may be said for distinguishing those hypotheses which are capable of 
experimental verification from those which are not; it might even be (though I do not believe 
it) that different kinds of uncertain inference are appropriate to the two cases; but it does not 
appear to me that the statistician solves any essential problems by confining himself 
to special classes of case. In practice he is concerned, like any other scientist, with the 
formation of opinions and the taking of decisions on incomplete evidence, and the mere fact 
that some of his data can be counted or measured does not, I maintain, relieve him of any 
major problem arising in the justification of his inferential processes. 

6. I assert that any theory of probability which does not take probability itself as a primitive 
« idea must, in some form or other, introduce an equivaient primitive before it can be applied. 

The rather naive approach of the algebraic text-book, for example, may take one of two 
main forms: it can state that if, of a set of n mutually exclusive propositions, m are favourable, 
the probability that a favourable proposition is true is m/n; or it can state that if m out of n 
equally probable and mutually exclusive propositions are favourable and one must be true, 
the probability of a favourable proposition is m/n. The first form is untrue (a man can be 
unidextrous in two ways, as right- or left-handed, but the probability is not 1/2 that a man is 
right-handed in any theory that I know); the second begs the question by the use of the 
expression ‘equally probable’. There are, of course, variants on the way in which this idea 


ee 


reneeeomwoeoe.w’s 


an 


ulus 
atics 
‘ the 
own 

the 
f an 
m/n. 
. for 
uate 
es of 
r, to 
ility 
up a 
jon; 


' 
nent 


sted 
* ex- 
t in 

the 
the 
the 
med 
lity’ 
is OF 
this 
it of 
- are 
le of 
ieve 
not 
iself 
the 
fact 
any 


itive 


two 
ble, 
of n 
rue, 





1 be | 
e = 
in is 


the 
idea 


M. G. KENDALL 103 


is put; we may speak of events instead of propositions, or refer to events happening ‘at 
random’. But they all come to much the same thing so far as concerns the point now under 
discussion. The concept of probability contains more than the mere idea of proportionality 
of cases. And, of course, the same argument applies when proportionality is replaced by the 
measure of a set or some more refined mathematical concept. 

7. The mathematician may fairly contend that such matters are not his concern, any more 
than the Euclidean geometer is concerned with the question whether there exist in practice 
objects which can be regarded as straight lines. One must be a little careful about arguing 
from the analogy of other branches of applied mathematics because at this stage we are 
concerned not only with the relationship of theory with the external world bu‘ also with the 
relationship between our calculus and the way we think. Even if we pass over such a point, 
however, and admit the mathematician’s right to deny interest, qua mathematician, we must 
accept responsibility for resolving the difficulty in some other capacity, as psychologists, 
logicians, or statisticians, unless we are prepared to relegate probability to the domain of 
pure mathematics and the possibility of its application to the phrontisteries of our academies. 

8. Von Mises (1928) has made a valiant attempt to provide a frequency ‘‘.eory of pro- 
bability, and for a time I was almost convinced that he had succeeded. Myself when young 
did eagerly frequent. But his theory fails, in my view, in much the same way as the other 
theories mentioned in.§ 6. He has, in fact, to introduce the idea of an ‘Irregular Kollektiv’ 
(English writers would call it a random series) or a ‘Prinzip des ausgeschlossenen Spiels- 
systems ’—the impossibility of a winning system in games of chance. The ‘Irregular Kol- 
lektiv’ itself is a new concept outside ordinary mathematics. Attempts by various writers to 
show that there exist sequences of numbers with the necessary properties break down, I hold, 
on the point that such sequences can only be shown to obey an enumerable set of conditions, 
whereas the ‘ Kollektiv’ must obey an innumerable set (unless the definition is to be modified, 
in which case we arrive at the repugnant conclusion that the laws of probability may be 
obeyed sometimes, but not always and not even, relatively speaking, frequently). 

9. I myself do not believe that anyone will ever succeed in producing a theory of pro- 
bability which does not, at some point, require a primitive idea equivalent to that of pro- 
bability itself. It is necessary, I hold, to have some concept of randomness, haphazardness 
or uncertainty of happening inherent in the subject which either must be introduced ex- 
plicitly into the axiomatization or, if omitted, must be introduced later before the theory is 
capable of application. Precisely where the idea is introduced is to some extent a matter of 
taste or didactic convenience. But appear it must. There can, I assert, be no pure frequency 
theory of probability any more than (to anticipate a later argument) there can be @ useful 
objective theory of probability without reliance at some point on empirical justification in 
terms of frequency. 

10. One point, however, is worth examination at this stage. Must we take as our primitive 
a general idea of probability which permits of the comparison of any two propositions, or is 
it sufficient to use only the idea of equal probability? The axiomatization referred to in $6 
suggests that equi-probability is sufficient if we can analyse our alternative possibilities to 
the point where they all stand on the same footing. If we can, as it were, break down our 
situation into atomic propositions with equal prabability, a mathematical measure of the 
probability of a subset follows readily enough. 

11. At first sight this does not appear to be possible. The probability that a new-born 
child is male is (say) 0-51 but it does not look as if we can regard this as one of a subset of 








104 Reconciliation of theories of probability 


51 favourable propositions out of 100 equi-probable propositions. But I think this is not 
a fatal objection. We arrive at the probability of 0-51 by counting cases, and it is assumed 
in doing so that the probability of occurrence of a male in each case is the same; in fact, we do 
base our probability on equi-probable events, and the same is true of any probability based 
on statistical frequencies. I am not prepared to say that we can in every instance schedule 
the equi-probable events explicitly. General judgements in probability often have to be 
made on the basis of a ‘feeling of the situation’ which is compounded of a multitude of 
relevant factors half-remembered or not explicitly remembered at all. There is, as Jeffreys 
has pointed out, an element of uncertainty attributable to the imperfection of the human 
mind itself. 1 am, nevertheless, inclined to think that a satisfactory theory can be founded 
merely on the notion of equi-probability or the equivalent notion of randomness. But it 
is not necessary to insist on the point. What I do insist upon is the necessity for a primitive 
idea of probability or randomness of some kind. 

12. It might be thought that the differences between the frequentists and the non- 
frequentists (if I may call them such) are largely due to the difference of the domains which 
they purport to cover. J assert that this is not so. One of the principal modern advocates of 
the non-frequency approach is Jeffreys, who takes probability as a measure of belief and is 
concerned with the application of his theory to scientific inference in general. But practically 
every example in Jeffreys’s book can be treated by the methods of the frequentists (or so they 
would claim). The fact is that in practice both schools deal with the same kind of problem. 
They differ because they approach the same problems differently, not because they deal 
with different problems. 

13. The essential distinction between the frequentists and the non-frequentists is, I think, 
that the former, in an effort to avoid anything savouring of matters of opinion, seek to define 
probability in terms of the objective properties of a population, real or hypothetical, whereas 
the latter do not, and, indeed, sometimes go further and repudiate the introduction of 
a population as irrelevant, incompetent and immaterial. To revert to the example of the 
sex-ratio in births, the frequentist would, at the outset of the inquiry, postulate the existence 
of a probability p that a birth was male in some population or ‘Kollektiv’ and then proceed 
to estimate it in the light of experience. The non-frequentist would begin by assuming a prior 
probability for the ratio and would then modify it in the light oc the posterior probabilities 
given by observation. If the observations are numerous his prior probabilities dwindle into 
insignificance and he gets much the same formula as the frequentist. But he is not, on the 
face of it, trying to do thc same thing. The frequentist is estimating an unknown constant; 
the non-frequentist is determining a probability which is not a constant, but varies according 
to the state of his knowledge. 

14. Suppose I draw a random example of 1000 births and find that 600 are male. Do 
I then infer (since this differs significantly from what could happen, to an acceptable degree 
of probability, in sampling from a population wherein the proportion of males is 51 %) that 
the proportion of males cannot be 51%? Most assuredly I do not. There is so much prior 
evidence in favour of a ratio of about 51 °% as against a ratio of 60 % that I require much more 
posterior evidence than this to shake my prior probability seriously. My previous knowledge 
must affect my judgement. The frequentist cannot, I think, question this. He must then do 
one of two things. He must either admit that probability is influenced by prior knowledge 
if that probability is by itself to be the basis of a rational judgement or course of action; or 
he must concede that the final judgement is based on a mixture of probability and some 





—p- — 


differe 
the se 
proble 
by sep 
the fr 
that i 

15. 
interv 
of an 
babili 
obser 
is an 


Fe — — BD wey VV OO me OD ee oF 





M. G. KENDALL 105 


different quality of uncertain inference which conditions it. It seems to me that if he takes 
the second course he lays himself open to the criticism that his theory fails to solve the 
problem with which he is concerned. One does not obtain an objective theory of inference 
by separating it into an objective and a non-objective part and ignoring the second. Perhaps 
the frequentist can maintain that his theory is consistent and logical. He cannot maintain 
that it provides more than a part of the answer to the fundamental question. 

15. I shall revert to this point later in considering the statistical theory of confidence 
intervals. It is sufficient for the moment to record the point that in arriving at a probability 
of an observed event even the frequentist must make some assumption about prior pro- 
babilities. This may, in the last analysis, mean no more than that he has to postulate of his 
observation. that they are conforming to a random process of specified type; but that itself 
is an assumption at least of equi-probability, which is an assumption concerning prior 
probabilities. I therefore assert that, to assign a probability in any practical case the frequentist 
as well as the non-frequentist requires a prior probability distribution from which to start. 

16. This raises two problems: (a) how, in any given situation, we determine the prior 
probabilities, and (6) how we justify the contention that what we are doing has any 
application in real life. 

The precise determination of prior probabilities may be a matter of practical difficulty, 
but I do not think this constitutes an objection. We may have a fair idea of the temperature 
of a room without being able to express our impression very precisely in degrees on a thermo- 
metric scale; and similarly, we may have a rough idea of measurable prior probabilities 
without being able to say exactly what they are. But if we pursue these general impressions 
back to their source, how do they arise? Not, I think, by any innate knowledge, if there is 
such a thing, but from even more prior probabilities. The prior values which we assign on 
any given occasion are themselves posterior values of a previous experience. Where did 
they start? 

17. It is difficult (but not impossible) for an adult to find any situation in which he has 
no prior knowledge whatever to influence his assessment of a probability. There may, how- 
ever, have been moments in his childhood when his ignorance was complete. This is not, so 
far as I can see, necessazily so because in the very act of learning the meaning of a proposition 
he may have acquired some expectation as to its truth. It is for the psychologist to explore 
these topics. For my present purpose I need only emphasize that prior probabilities are 
themselves built up from experience, albeit an unremembered and unconscious experience. 

18. Jeffreys and some other writers have endeavoured to meet the ‘situation of initial 
ignorance’ by laying down certain ruies determining the types of prior probability distribu- 
tion to be adopted in specified cases. When Jeffreys’s book first appeared this seemed to me 
the least convincing part of his treatment, and it seems so still. In fact, my doubts increase 
as the rules proposed for different situations multiply. It cannot be necessary to adopt 
peculiar rules for prior distributions in order to say that we know nothing. As I understand 
him Jeffreys argues that his rules are necessary in some cases because otherwise there would 
arise mathematical difficulties. But this seems to me a circular argument, and more a ground 
for examining the mathematical treatment than a justification of the theory. In reading 
some of the papers in this vein one sometimes has difficulty in resisting a cynical suspicion 
that certain kinds of distribution are introduced because they give the right kind of 


answer, which may be a posterior justification but is unsatisfactory in laying down the bases 
of a subject. 











106 Reconciliation of theories of probability 
19. I assert: 


(a) That the assignment of a probability to observed phenomena requires in all cases the 
assignment of a prior probability or some equivalent procedure such as the assumption that the 
generating process is random. 

(b) That situations rarely, if ever arise, in which there is no knowledge of prior probabilities. 

(c) But that, if such a situation arises, the only possible rule to use is that of Bayes in which 
all the possibilities are given the same prior probability. 

Let me add two scholia: 

(i) Some difficulties arise over the mathematical treatment when we are considering 
parameters which may have an infinite range or even when they are continuous over a finite 
range. These difficulties, I hold, are mathematical in the sense that ideas of continuity and 
limiting processes are mathematical. They raise problems, but the existence of such problems 
is not very relevant to the basis of the theory of probability. 

(ii) As data accumulate the posterior probability dominates the total probability so that 
it makes very little difference what the prior probabilities were. This is an argument for using 
prior distributions which are plausible but inexact if they make the mathomatics easier, 
at least for large samples. It in no way affects the points now under discussion. 

20. So far my summing up has been in favour of the non-frequentists on the points con- 
cerning the impossibility of founding the theory without some primitive idea of probability 
and the necessity of using prior probabilities. I must now restore the balance, and show that 
the non-frequentist requires notions of frequency in some form or other to make his theory 
of any practical importance if it is to be a numerical theory. 

The frequentist tries to give his probabilities objectivity by basing them on frequency of 
occurrence. The non-frequentist would also like to obtain objectivity, of course, but he is 
concerned with the measurement of an attitude of mind which is subjective. He may, as 
Keynes did, beg the question by speaking of degrees of rational belief as if rational minds 
could not differ in the assessment of a probability on the same evidence, which in the ultimate 
analysis implies that a belief founded on the same data is only rational if it agrees with one’s 
own. Jeffreys makes a better attempt, I think, by admitting that there is no logical answer 
to the solipsist but offering to believe in him on the basis of reciprocal aid. However, one must 
not make too much of this kind of point. We all assume that there are certain rules of thought 
which are common to all rational human beings. Let us then suppose that the rules of pro- 
bability are commonly accepted. Let us further suppose that on given data two different 
individuals will arrive at the same assessment of prior probabilities. This is a very consider- 
able assumption to make because the ‘background’ of individuals may vary so much that 
we cannot be sure that the data can ever be the same; consequently each individual may 
possess his own schedule of probabilities, and we do not emancipate ourselves from matters 
of opinion. However, in order to press on to the main difficulty, let us make the assumption. 
We then arrive at the fundamental question: what use is it to calculate a probability ? 

21. There is no point in calculating probabilities as an arithmetical exercise unless we 
are willing to relegate the theory of probability to the position of branch of pure mathematics. 
They must either be pure measures of a mental attitude or they must correspond to something 
in the external world. If they are only measures of belief, they may still form the basis of 
rational actions and rational decisions. But why are they of any use in this respect unless 
our decisions or actions are improved by them—in fact, unless we are more often right by 
using them than by ignoring them? The frequentist does not encounter this difficulty, for 





M. G. KenDALL 107 


his probabilities purport to describe observed frequencies. The non-frequentist, it seems to 
me, needs a new basic assumption. 

I assert that, for a non-frequentist theory of probability to be applicable it is necessary to assume 
that propositions with greater probability are true more often in fact than propositions with less 
probability .* 

22. Certain minds, including some types of trained scientific mind, will instantly revolt 
at any contention that there is a positive correlation between what we believe and what is 
true. The scientist, very properly, is trained to suspect even his own beliefs and, remembering 
that whole populations have believed that the world is flat or that heat is a fluid, is not even 
very impressed by an overwhelming body of collective opinion. And yet the number of 
instances in which people have believed something wrong is relatively small; and frequently 
the erroneous propositions which have been most widely believed are those whose meaning 
is in any case abstract and obscure. Every conscious movement I make is based on the 
belief that what has happened before will happen again. The use of language itself is a testi- 
mony to this belief, and anyone who denies it has to explain why he thinks that the sounds 
he utters or the marks he inscribes on paper in making the denial will be understood. 

23. I am not arguing that there is any less reason than in the past to seek for evidence, 
and as much of it as one can get, before coming to conclusions. I am only saying, first, that 
in mary ordinary decisions we already have strong prior probabilities, and secondly, that 
we may have to take decisions on very slight evidence or on prior probabilities which only 
just favour one course as against the alternative; and I ask, what grounds have we for sup- 
posing that our probabilities are a good guide to conduct? It is here, I think, that the 
frequentist can assert himself; for the only grounds are that we have found in the past that 
it is so. Our reliance on our probabilities is based on the frequency with which we have found 
it justified in the past. This, I think, is the pragmatic frequentist sanction for a non-frequen- 
tist theory. It works. 

24. The line of thought which leads me to suggest that a reconciliation of the frequentist 
and non-frequentist views is possible should now be clear. The frequentist seeks for objectivity 
in defining his probabilities by reference to frequencies; but he has to use a primitive idea 

of randomness or equi-probability in order to calculate the probability in any given practical 
case. The non-frequentist begins by taking probability as a primitive idea, but he has to 
assume that the values which his calculations give to a probability reflect, in some way, the 
behaviour of events. Frequentiam furca expellas, tamen usque recurret. Neither party can 
avoid using the ideas of the other in order to set up and justify a comprehensive theory. 
I believe that if this is firmly grasped the fundamental differences vanish. There may still 
be room for argument about the best way of presenting the subject from the logician’s or 
the teacher’s point of view, but that is quite a different matter. 


FouNDATIONS 


25. In laying down the collection of axioms, postulates and general rules-of-the-game 
which provide the foundations of a theory, we have a substantial amount of choice which is 
exemplified by the very different treatments given by different authors under the title of 

* In this article I speak sometimes of probabilities of events, sometimes of propositions, sometimes of 
propositions that are true, sometimes of events that will happen. Such terms require definition and 
clarification in a highly rigorous exposition, but to have attempted such a thing here would have obscured 
the main points I am making. I am satisfied that in omitting such a treatment I am not glossing over any 
difficulties which affect my main conclusions, 











108 Reconciliation of theories of probability 


‘Theory of Probability’. The mathematician is apt to skate rather lightly over the problems 
of relationship with experience in order to press on to the calculus, which is his primary 
interest. The logician goes more deeply into the correct axiomatization but sometimes does 
not proceed to develop the theory to the point of showing its practical utility. One cannot 
complain of this, for only an encyclopaedist can do justice to the subject in its entirety, and 
even a collective work like Borel’s Traité leaves large areas untouched. One can indeed 
complain of the lack of sympathetic understanding shown by some authors to the works 
of others; but it is unfortunately true that among the exponents of the science of objective 
judgement there seem to be as many emotional judgements as among less enlightened sections 
of the community. 

26. The very extent of the domain to be covered requires an author to be selective, but 
there is a serious danger that his selection may give a bias to his treatment. For instance, 
it is a fairly common practice to begin with the throwing of dice or the tossing of coins as 
illustrations of the ‘kind of situation’ with which the probabilist is to deal, as if the theory 
of probability were the same thing as the doctrine of chances. It is not part of my purpose 
to discuss such matters on the present occasion, but it is fair to point out that a teacher ought 
to be very careful not to put only one side of the case to his students. Admittedly he must 
start with simple ideas and not destroy their self-confidence by leading into the difficulties 
at the beginnin,; of the subject—as in mathematics itself, the fundamentals are the most 
difficult and the beginnings should come last. But in reading most modern treatments 
I cannot help feeling that a little more impartiality would be an improvement, and that an 
author should do more than ignore opposing views or mention them merely to refute them. 

27. My discussion of the frequentist and non-frequentist viewpoints in $§ 3-24 has left 
me little more to say about fundamentals. I have only two points to make. The first concerns 
the introduction of probabilities associated with continuous variables. From the idea of 
the probability of events we can build up the idea of the probability of variate values, and 
hence the idea of a probability function in the discontinuous case. It is then tempting for 
the mathematician to jump to the continuous case without pausing very long for thought 
and to write such expressions as 


dF = f(x) dz, (1) 


which purport to express the fact that the element of probability dF in the range dz is 
f(x) dx. A variate transformation is then made by writing x = x(£), and we have 


dx 


OF = fix(e)} Ge 8, (2) 


expressing that the probability in the range dé is f{x(£)} dx/dé. 

The retention of the differential element is thus very important in the expression of a 
probability. It is well known that there is something arbitrary in the determination of 
probabilities in a continuum. But on occasion we ignore the differentials and concentrate 
on the frequency function /, as, for example, in methods involving the likelihood function. 
There are contexts where it is important to remember that likelihoods in this sense are not 
invariant under variate transformations, so that tests based on likelihood ratios contain an 
implicit assumption about the random process generating the observations. 

28. My second point is that in setting up a theory the non-frequentist has to take one 
hurdle which the frequentist by-passes. It is necessary for him to establish that his pro- 
babilities are measurable on a numerical scale unless (like Keynes and some other writers 


ma tant hae 4a 


QO VB BO am hae or ON 


M. G. KENDALL 109 


on logic) he is content to see his theory so indefinite that practical application is confined 
within very narrow limits. Most authors are prepared to make whatever assumptions are 
necessary to permit their probabilities to be numerically measurable. There is evidently 
some latitude of choice in postulates for the purpose. The simplest course is merely to 
postulate that probability ts measurable. Jeffreys starts from a slightly anterior point and 
requires the axiom that of two probabilities p and q, p is either greater than, less than, or 
equal to q. This puts his probabilities in order; he requires a further axiom that if pro- 
babilities are put in order they can be associated by a (1, 1) correspondence with a set of real 
numbers in increasing order. This ensures that there are enough numbers for the purpose, 
and he then calibrates his scale by reference to the theorem that of n equally probable and 
exclusive events the probability of a subset m is m/n, and hence arrives at the theorem that 
any probability can be expressed by a real number. There may be other ways of arriving at 
the same result, but it appears unlikely to me that any of them could avoid making assump- 
tions equivalent to those of Jeffreys. 


DIRECT THEORY 


29. The direct theory of probability, as I am using the expression, consists mainly of the 
formulation of rules for building up from the probabilities of simple propositions the pro- 
babilities of more complex collections of propositions. So far as I know there are no serious 
difficulties or differences of opinion about the simpler rules such as those expressing the 
probabilities of the sum of propositions. Nor are there any problems, other than those of 
pure mathematics, once certain fundamental rules have been established. Such valuable 
and ingenious little books as Whitworth’s Choice and Chance contain many problems, 
but they are all essentially mathematical. There is, however, one rule which offers a stumbling 
block and merits particular attention, namely, the product rule which is usually expressed 
in some form such as 

P(pq|h) = P(p|h) P(q| ph) 
= P(q|h) P(p| qh), (3) 


that is to say, the probability of both p and q on data h is the product of the probabilities of 
p on h and of q on p and h (or, equivalently, of g on A and p on q and h). I find it a useful 
practice, whenever meeting with a new book on probability, to go straight to the derivation 
of the product rule. It provides a kind of touchstone for the author’s whole treatment. 

30. For the frequentist the derivation is simple. If, of a set of n equi-probable and exclusive 
propositions & are favourable to p and q, / to p and m to q we have 


Tanta (*) 


which establishes the proposition. There can be little doubt, I think, that the simple arith- 
metical identities of equation (4) are the basis of our readiness to accept the product-rule as 
generally valid in any theory. The idea of equi-probability or randomness contains within 
it the product-rule for a finite number of alternatives. There may be difficulties in dealing 
with limiting cases where infinities are involved, but I think they are far from insuperable 
and need not be brought up here to confuse the issue. 

31. The non-frequentist has a much harder task to establish the rule. Johnson (Logic, 
vol. 3) treats it as an axiom, and his view is important because he influenced both Keynes 











110 Reconciliation of theories of probability 


and Jeffreys. Keynes himself, I think, fell into error on this point. He defines the product 
of probable relations by 


P(p| qh) P(q|h) = P(pq|h), (5) 


where, it must be remembered, equation (5) is merely a rule for manipulating logical symbols, 
not the expression of numerical probabilities. Although there is some latitude of choice in 
which of our elementary propositions are introduced as definitions and which as postulates, 
it is repugnant to general expectation that an important rule such as the product-rule should 
be introduced by mere definition. I find Keynes’s subsequent attempt to establish a numerical 
theory of probability hard to follow, and I cannot see that in his theory the product-rule 
states anything more than that we may multiply numerical probabilities when we may, 
which is true but carries us no further forward. 

32. Jeffreys takes what I think is the only possible course and reverts to Johnson’s 
treatment by taking (3) as axiomatic. 

Now the product-rule is not obvious. We are therefore entitled to consider why it should be 
introduced at all. The somewhat cynical reply that without it we could not get the answers 
we want is insufficient; we still have to explain why we want chat kind of answer. Here, 
again, [ think the frequentist comes into his own. The only justification for the product-rule 
that I know is based on the fact that it can be established for sets of equi-probable exclusive 
alternatives as in equation (4). J therefore assert that both frequentist and non-frequentist 
theories are compelled to rely, for the justification of the product-rule, on the properties of equi- 
probable exclusive alternatives. We are, I hope, one stage nearer an understanding how the 
two approaches are interlocked. 


INVERSE THEORY 


33. A precisian might reasonably contend that there is no such thing as ‘inverse’ pro- 
bability, but the term is so widely used that one cannot disturb it. ‘Probabilité des causes’ 
comes rather nearer to expressing what we mean by it. The essence of the process lies in its 
attempt to reason from observatior. to the general law governing observation. In its broadest 
aspect it seeks to provide a science of induction. 

34. The non-frequentist can set up a theory of inverse probability, or perhaps it would 
be better to say a theory of inference, without a great deal of difficulty. Some frequentists 
would deny this, but I think the majority of the grounds of their denial would be found, on 
analysis, to relate to the more basic problems referred to in the earlier parts of this paper. 
It is, for example, legitimate for the non-frequentist to consider the prior probability dis- 
tribution of an unknown parameter (which may be a natural constant such as the mass of 
a proton) and to modify it in the light of experiment according to the relation 


posterior probability = prior probability x likelihood. (6) 


In this way the non-frequentist can proceed, by continued experiment if necessary, to con- 
centrate his probabilities in narrowing ranges, or in short, to get continually nearer to the 
truth; always provided that he can obtain a prior probability to start with. 

35. It is in the problem of determining the initial probabilities that most of the modern 
controversy has centred. In fact, at least one statistical school of frequentists has felt the 
difficulty so acutely that much of their work on inference has been devoted to finding methods 
which avoid the use of prior probabilities altogether. In doing so they encounter troubles 
of their own which I discuss below. Even the non-frequentists, however, are not free from 


troub! 
referr 
proba 
inevit 


very ! 
consi 
of kn 


M. G. KENDALL lll 


troubles in formulating rules for the determination of prior-probabilities. I have already 
referred to the point in §18. The postulate of Bayes, that if nothing is known about prior 
probabilities they are to be assumed equal, appears to me, after many years’ reflexion, to be : 
inevitably right. When we are in a genuine state of indecision we ‘toss up for it’. But it is 
very rarely indeed that we are in a complete state of ignorance about the alternatives we are 
considering. The mere fact that we understand what we are talking about implies some kind 
of knowledge. We may, indeed, be unable to set exact values on our vague appreciation of 
the prior position, but that is not really the point. 

36. In an endeavour to avoid Bayes’s postulate several modern writers, notably Fisher, 
Neyman and E. S. Pearson, have propounded methods which are widely accepted in one 
form or another by statisticians. The methods do not always agree between themselves, and 
in one famous case—the Behrens test—give different results; but they have the common 
object of attempting to escape from the necessity of using prior probability distributions. 
I shall discuss, in that order, maximum likelihood, fiducial inference and confidence intervals 
with the object of showing that there is, in fact, no escape except by the introduction of new 
assumptions. 

MAXIMUM LIKELIHOOD 


37. Ifa probability function f(x, @) depends on a single unknown parameter @ the prin- 
ciple of maximum likelihood enunciates that to estimate @ we should take that value which, 
for variations in 0, maximizes the likelihood of the observations, for instance, in a sample 
of n independent observations, the function 


kw Ife, (7) 


I have discussed this principle on a previous occasion (1940). All I need say here is 

(a) that in large samples the method gives much the same results as the method of Bayes, 
as is emphasized by Jeffreys; and 

(6) that the strong posterior recommendations of the method (many of which relate to 
large samples) do not obviate the necessity for recognizing that the principle of maximum 
likelihood constitutes a new postulate. 

38. The principle, in fact, is not obvious in the sense of being immediately and intuitively 
acceptable. It states that we are to proceed on the assumption that the most likely event 
has happened. Now, why? Even if we ignore any prior knowledge, for which the principle 
makes no allowance, and assume for the sake of argument that the most likely event is the 
most probable event (an assumption which appears to me to involve Bayes’s postulate), 
we know, or at least we believe, that the most probable event does not always happen. The 
justification for our principle can then only be that it leads us closer to the truth on the whole 
than other methods. This appears to me to be another form of the postulate which is referred 
to above as necessary for the non-frequentist theory, namely, that what we believe is, on the 
whole, true. / therefore assert that the principle of maximum likelihood requires a new postulate 
which is, in the last analysis, equivalent to one of the assumptions required to validate the non- 
frequentist theory. ; 

39. Here I interpolate one comment of general application. In seeking for a satisfactory 
logical basis of modern methods, and in pointing out that some of them are not so firmly 
based as they seem, I am in no way trying to undermine the use of those methods. What 
I am trying to do is to bring to light the latent assumptions on which they are founded. The 








112 Reconciliation of theories of probability 


same applies to the other methods which I consider below. I think it is desirable to make this 
statement, partly to disarm those writers who may jump to the conclusion that I am laying 
the axe to the roots of their work (which is not my intention), and partly because there is 
a danger that critics of the new methods may swing too far the other way, and in finding 
that there are legitimate grounds for doubt about the logical foundations, may be tempted 
to reject the methods altogether. English writers may not go to these extremes, but the 
following extract from Pietra (1948) will illustrate my meaning. Referring to some recent 
publications by Gini on inverse probability, Pietra says (p. 70): “but there is no doubt now 
that [Gini’s] revision [of the basis of probability in statistics] has brought to light the fallacious 
character of the numerous illusions on which the great success of the Anglo-Saxon develop- 
ment of our subject is based.* An Anglo-Saxon may be pardoned for feeling a little nettled 
by this kind of statement; but it represents a point of view which he would be wise not to 
ignore. 
FIDUCIAL INFERENCE 


40. The so-called method of ‘fiducial’ inference also requires a new postulate. The results 
which it gives in some cases agree with those given by the theory of confidence intervals 
and by Jeffreys’s form of non-frequentist probability, but since it can give results which are 
not true in the former and is mostly advocated by those who repudiate the latter it can hardly 
be regarded as equivalent to either. The essence of the fiducial process appears to be this: 
if a frequency function of a sufficient statistic t and a parameter 0 is 





OF (t,0 
dF = oD a, (8) 
the fiducial distribution of 6 is given by 
ap. EB (9) 
00 


This is used to give a range within which the value of 6 may be regarded as lying to an 
acceptable degree of ‘probability’. 

41. The first thing to note about an expression such as (9) is that it has a differential 
element d@. To a frequentist this means nothing in terms of his probabilities because 6 is an 
unknown constant and has no probability distribution other than the trivial one of unity 
when @ has the true value. The introduction of fiducial inference then implies not merely 
a new postulate about the behaviour of probability but a new kind of uncertainty. Exactly 
what this kind of uncertainty may be has never been explained, but it is not very unlike 
probability in the sense of degree of belief, at least in some contexts. 

42. Secondly, the transition from (8) to (9) is not by any means an obvious process, and 
I cannot myself see by what argument it is to be supported. At the least it amounts to a new 
postulate. Whether it is acceptable is a matter of taste, which perhaps is not worth discussing. 
The important point is that if one follows R. A. Fisher in rejecting Bayes’s postulate and the 
notion of prior probability in favour of the principles of maximum likelihood and fiducial 
inference, one is not making any economy in new concepts—in fact, the contrary. My 
personal feeling is that Bayes’s postulate is much more acceptable than the fiducial postulate. 
There is another complication in the use of fiducial inference which equally affects confidence 
intervals and I may as well proceed to it at once. 


* ‘Ma é anche fuori di dubbio ormai che dalla revisione stessa é emersa la fallacia delle molte illusioni 
sulle quali fondava il suo maggiore successo |’indirezzo Anglo-Sassone della nostra disciplina.’ 





es 





43. 
stater 
equal 
makit 
proba 
randc 
varia 


what 
sible. 
the p 
in pr 
to be 


oni 





M. G. KENDALL 113 


CONFIDENCE INTERVALS 


43. The theory of confidence intervalsis an ingenious attempt toset up a method of making 
statements in probability without the use of prior probabilities or Bayes’s postulate, and 
equally without the use of an alternative postulate. It acknowledges the impossibility of 
making assertions that a parameter (an unknown constant) will lie, to specified degrees of 
probability, within specified limits, but meets the situation by substituting for such limits 
random variables whose values can be observed. It is then possible ¢:: choose two random 
variables ¢, and é,, and to assert, with given probability of being right, what 


t, <O<t,, 


whatever #, and ¢, actually turn out to be when the observations are made. Since it is pos- 
sible, at least in some cases, to choose t, and ¢, to be independent of the unknown parameters, 
the probability of being right remains the same whatever values of 0 are actually encountered 
in practice. The method thus appears to be independent of any prior distribution of 6, and 
to be quite independent of any approach such as is embodied in Bayes’s postulate. 





1 


ly 


7. rem 


FS 








Value of m 











0 Value of p 1 
Fig. 1 


44. Consider again the example referred to in §14. Suppose I assume that a sampling 
process is such as to reproduce a binomial distribution—there is a good deal of evidence for 
this in the case of births. I observe a value of 0-60 as the ratio of male to total births in a sample 
of 10,000. The theory of confidence intervals says that I may assert that the proportion p 
lies between 0-59 and 0-6! with the probability that, if I make this type of assertion syste- 
matically in all similar cases, I shall be about 95 % right in the long run. But I do not then 
make such an assertion because I know too much about birth-rates to believe any such thing. 
The theory of confidence intervals gives no place to prior knowledge of the situation. How, 
then, can that theory provide a guide to conduct in making decisions? 

45. This difficulty arises in most of the accepted techniques of modern statistics which, 
while using judgements in probability, make no allowance for prior probabilities. I shall 
discuss confidence intervals to fix the ideas; but much the same considerations apply, for 
example, to the theory of testing hypotheses, to fiducial inference, to the analysis of variance 
and to sequential analysis. The fundamental problem is to find some method of incorporating 
prior knowledge or prior probabilities into the final probabilistic judgement of the situation. 
In order to consider it I need to recall briefly the nature of the theory of confidence intervals. 

46. The diagram of Fig. 1 isa familiar presentation of confidence intervals for the binomial 
distribution. 

Biometrika 36 8 











114 Reconciliation of theories of probability 


For any given value m and fixed sample number n we may calculate the binomial distribu- 
tion (y+ )", and, having decided on a confidence coefficient «, find a range of values from 
Pp to p,, such that the proportion of the total distribution lying inside the range is «. (A minor 
question as to discontinuities in values of p, and p, I ignore as not affecting the argument.) 
Having further decided (as I shall for simplicity, again without affecting the argument) to 
take central confidence intervals, we may map the confidence lines L, and L, by plotting, 
for each value of w, the appropriate values of p, and p,. The confidence lines are, as it were, 
constructed horizontally by plotting the abscissae for selected ordinates. 

To use the diagram we read it vertically. For a given abscissa (observed value of p) we 
read off on the confidence lines the corresponding ordinates (values of mw), say w, (on L,) 
and m, (on L,). We then assert that 

@,< WK< M2, (10) 
with an assurance of being in the long run right in proportion « of the cases in which we make 
this type of assertion. 

47. From the way in which the diagram is constructed it follows that, whatever the 
frequency distribution of w may be in the cases to which we apply the method, a proportion 
« of the happenings will lie in the confidence belt between the confidence lines; for the pro- 
portion is a in each horizontal elementary strip, and hence is so for the belt as a whole even 
if different weights are assigned to different strips. Now let us suppose that we know 
something about the prior probabilities of w; to take an extreme case I shall suppose that we 
know that o lies in a certain range, say M, to M,. If we adhere to the confidence rules we shall 
still assert (10) and shall still be right in proportion a of the cases. But nobody in his senses 
would assert (10) if the range w, to m, was outside the range UV, to M, at any point. We shall 
continue to be right in proportion @ of the cases if we assert an inequality 

V1 5 B< Yo, (11) 
where y, is the greater of m, and M, and y, is the smaller of w, and M,. In short, we can 
modify and shorten our confidence intervals in the light of prior information without loss 
of accuracy. 

48. To take a slightly more general case let us now suppose that we have a known prior 
probability distribution of , f(w)dm which varies effectively in the range 0 to 1. We can 
still improve on the inequality (10) without loss of accuracy. We merely have to determine 
a domain in the square of Fig. 1 such that the total probability (allowing for variation of 
@ as well as of p) shall be « and obeying certain elementary requirements as to connexity 
and convexity of the confidence belt. The situation is essentially the same as if, in deter- 
mining une confidence line, we started with probability f(m) (y+ 7)" instead of the second 
factor only. 

49. Provided, then, that we know the prior distribution f(@) we can incorporate it into the 
determination of our confidence belt without much difficulty. The essence of the method of 
confidence intervals is not that it takes f(w) to be unity, to comply with Bayes’s postulate 
(though that is its effect) but that it ignores f(m) and can still obtain accurate statements in 
probability. It obtains (in this case and all cases where ad hoc prior distributions are not 
introduced for special reasons) the same results as if Bayes’s postulate were applied. But to 
avoid the use of the postulate and at the same time to maintain rigour it has to sacrifice a 
great deal. It pretends, so to speak, either that there is no prior probability in the frequency 
sense which is being used to set up the confidence intervals, or that any prior knowledge must 





M. G. KENDALL 115 


be blended with the results of the theory in some manner unspecified before the final judge- 
ment is made. I suspect that the advocates of confidence intervals have often forgotten this 
fact; I think that they often do not appreciate its importance; I am certain that they fail 
to bring it out adequately in their expositions. J therefore assert that the theory of confidence 
intervals offers only a partial solution to the problem of estimation, and that to provide a basis of 
rational action it is necessary either to introduce prior probabilities into the theory or to find some 
new way of linking the theory with prior knowledge or prior expectation. 

50. Two further points require stressing in connexion with confidence intervals. First, 
it is part of the hypothesis that the observations are being generated by a random process. 
We believe this, as a rule, on the basis of collateral evidence, i.e. on the basis of prior know- 
ledge. I know of no convincing explanation why we are supposed to use this prior knowledge 
but to ignore any we may have about the parameter under estimate. Secondly, the method 
requires some theorem such as Bernoulli’s to the effect that if the probability of an event is 
p it will happen in proportion p of the cases in the long run. There has been a good deal of 
discussion and some misunderstanding about the role of Bernoulli’s theorem in the theory 
of probability. Essentially it is a proposition in pure mathematics which may be expressed 
by saying that the proportion of total frequency in the binomial series (y+ @)" in the neigh- 
bourhood of the mode nw tends to unity as n tends to infinity, ‘neighbourhood’ meaning 
within any fixed multiple of the dispersion ./(mwy) however small. It is not a justification 
of the frequency theory of probability (as Bernoulli himself seems to have thought) in the 
sense that it asserts anything about the frequency of happenings of equi-probable alter- 
natives. No amount of mathematics, in fact, can assert anything about events which was 
not latent in the premisses concerning those events. Consequently, in relying on the result 
of the theorem (or anything equivalent to the effect that events of probability p will happen 
in proportion p of the cases), the theory of confidence intervals, like the theory of direct 
probability from the frequentist viewpoint, requires the basic assumption that the processes 
with which it is concerned do behave in the manner required by the theory; in short, that 
random processes do exist. 

51. It would be tedious to trace the same points through other statistical techniques such 
as the theory of testing hypotheses. I will merely note that they occur in much the same form 
and, so far as I know, do not raise any essentially different problems. Not that there are no 
other parallel problems in statistical theory—conditional inference and order-statistics are 
two examples of cases where further examination of fundamentals is required—but that the 
main points have been covered in the foregoing. 

52. A friend of mine once remarked to me that if some people asserted that the earth 
rotated from east to west and others that it rotated from west to east, there would always 
be a few well-meaning citizens to suggest that perhaps there was something to be said for 
both sides, and that maybe it did a little of one and a little of the other; or that the truth 
probably lay between the extremes and perhaps it did not rotate at all. [ wish, in conclusion, 
to emphasize that I am not attempting a compromise of this kind in endeavouring to reconcile 
the different theories which have been put forward by different authorities. It is not so much 
a question of choosing between viewpoints as of synthesizing them in order to get a complete 
picture of the whole. 








116 Reconciliation of theories of probability 


REFERENCES 


An account of the various statistical techniques mentioned in this paper is given in my Advanced 


Theory of Statistics together with extensive references which I need not repeat. The specific works 
alluded to are: 


Cramer, H. (1945). Mathematical Methods of Statistics. Uppsala: Almqvist and Wicksell. 

JEFFREYS, H. (1939). Theory of Probability. Oxford University Press. (2nd ed. 1948.) 

Jounson, E. W. (1926). Logic. Cambridge University Press. 

KENDALL, M. G. (1940). On the method of maximum likelihood. J. R. Statist. Soc. 103, 387. 

Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan. 

Koxmocororr, A. (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin: Springer. (Reprint, 
1946, by the Chelsea Publishing Company, New York.) 

Pretra, G. (1948). Studi di Statistica Methodologica. Milan: Giuffré. 

von Mises, R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit, 3rd ed. revised 1936. Berlin: 
Springer. (English translation, 1939, W. Hodge, London.) 





a & 


it, 








[117] 


THE DERIVATION AND PARTITION OF x? IN CERTAIN 
DISCRETE DISTRIBUTIONS 


By H. O. LANCASTER, M.B., B.S., B.A. (SypNEy) 


Rockefeller Fellow in Medicine 


SUMMARY 


1. (1) It is shown how the general term of any multinomial can be reduced to a series of 
binomial terms, to each of which corresponds a value of x? for one degree or freedom. In an 
accompanying paper (pp. 130—4 below), Dr J. O. Irwin (1949) has shown that corresponding 
to this reduction there is an exact partition of x? and a certain Helmert matrix. This partition 
can be formally related to regression analysis by showing that it is equivalent to the selection 
of variables of the form 2; j,__./0%;. jx...- 

(2) The expression for the probability of en r x s contingency table can be partitioned into 
the product of the probability of (r—1)(s—1) fourfold tables, the x of each of which is 
uncorrelated with that of any of the others. Asymptotically, when the expected frequencies 
are large, all the y’s are normally distributed so that we have (r—1)(s— 1) normal and un- 
correlated deviates. Any difficulties as to degrees of freedom are avoided in this proof. 

(3) It is further shown that corresponding to the method of treating the r x s table set out 
in (2), there is an exact partition of x? which can be obtained by pre- and post-multiplication 
of the (r x s) matrix of standardized variables by certain Helmert matrices. This operation 
makes the variables in the first row and in the first column all zero, leaving a matrix with 
(r —1)(s—1) standardized and uncorrelated variables. Each of these variables is the x of 
one of the component fourfold tables, when calculated by the use of expectations obtained 
from the original margins and not from the component table itself. 

(4) Numerical examples are given for the case of a multinomial distribution, for the four- 
fold table and for a 3 x 3 table. In each case the partition of x? is illustrated. 


THE BINOMIAL AND MULTINOMIAL DISTRIBUTIONS, CONTINGENCY TABLES AND x? 


Introductory 


2. Proofs of the distribution of x? in contingency tables have been available since the now 


classic articles of Karl Pearson (1900), R. A. Fisher (1922, 1924) and G. Udny Yule (1922). 
However, the specification of n standardized variables corresponding to the individual 
degrees of freedom, especially those corresponding to comparisons of practical interest, has 
often raised difficulties. Sometimes we may be interested in a partition which is only 
asymptotically exact for large samples. For instance, in the example quoted by Fisher (1944) 
of Weldon’s dice throws, the binomial (3 + 4)" was.fitted to 26,306 throws of 12 dice. A test 
showed that the dice were biased (x? = 35-491 with 10D.¥.); but when the value of p in the 
binomial was estimated from the data a satisfactory fit was obtained (x? = 8-179 with 9 D.F.). 
The difference is 27-312, and this is due to the difference between the estimated and theo- 








118 Partition of x? in certain discrete distributions 


retical values of p. In fact, the ratio of ( — §) to its standard error is 5-20, the square of which 
is 27-04 and represents x? for 1 p.¥. This partition is only approximate, but the equation 


PU fi» Sas oo fn| P) = Pv Sar +> fn |p) P(p | P) (1) 


suggests that the partition may be asymptotically true. In fact, an application of Stirling’s 
formula shows that (1) is asymptotically equivalent to 

(27)-™ exp (— xn) = (27) exp (— 4x51) x (277) exp (— 3x3), (2) 
where x2, X3_;, X? are x” with n, (n—1) and 1 D.F. respectively, and mn = 10 in this case. 

In the present paper it is shown that the multinomial can be expressed as a product of 
(x — 1) independent binomials. Stirling’s theorem is then applied first to the whole distribu- 
tion and secondly to each of these binomial expressions. In the first case the usual value for 
x*, namely, &(2—m)?/m, where x is observed and m the expected frequency, is obtained. In 
the second case we obtain (m — 1) normally and independently distributed variables with unit 
variance and zero mean, the explicit forms of which arise naturally out of the algebra. By 
equating the usual x? to this second sum, it is clear that y? is distributed with (x — 1) degrees 
of freedom. This derivation is then generalized to show that x? for an (r x s) contingency table 
can be partitioned into the y*’s of (r—1)(s—1) fourfold tables with 1 p.F. each. 

These partitions of x? are only asymptotically exact, but Dr J. O. Irwin has shown that 
they can be made exact, for any size of sample, for the multinomial distribution and for 
contingency tables, by a slight modification which amounts to finding a matrix of Helmert’s 
type to transform the standardized variables, (observed — expected)/,/(expected). 


Notation 


3. Multinomial. p, will refer to the probability of an observation belonging to the ith 


n 
class (i = 1,2, ...,). The number of observations falling into ith class will be a,;and > a; = a. 
i=1 
The multinomial will thus be (p, + p,+...+p,)*. 
Contingency tables. A similar notation will be used for contingency tables. p,; will be the 
probability of an observation falling into the class in the ith row and jth column where 


i= 1,2,...,r; 9 = 1,2,...,8. a,; will be the observed number of observations in this class. 


8 
We define Pi. = Pip P.3 = EP (3) 
and similarly for the a;;. Further 
LP. = UP.4 = UPy = 1,4) 
ik ac tik ns =a (4) 
re Cee = ee ~~ = 


i j i,j 


k l Ll ok 
We also write Ry. = X45 Cy = Lay TT. = Py 2 a5. 
j= i- i=1 j= 


THE DERIVATION OF x? FROM DISCRETE DISTRIBUTIONS 
BY USE OF STIRLING’S APPROXIMATION 
4. In the multinomial distribution (p, + p,+...+,)* 
ati On 7 On—1 ! 
“Ha, +a,_,)}! 
P(a,| pa) =a! (25) = —PSPe=iGn +003)! _ 
a;: (Pat Pn-1) ss "10, Ay}! 


x general term of (p, + po+...+{Pp + Pn—s})*- (5) 








~~ = te =- fS 


) 


H. O. LANCASTER 119 


So that it is easily seen that P(a,| p;,a) can be expressed as the product of the general terms 
of a binomial and a multinomial of one fewer terms. This process can be continued, and 
gives us the following: 

Theorem (asymptotically true). Any general term of a multinomial expansion can be 
written so as to represent the product of (x — 1) binomial terms. Each of these terms can be 
represented by an expression of the form 


{1/o; J(2m)} exp (— 4x7). 
The x; are uncorrelated. With large expectations, the y; are normally distributed and hence 


a—i 
independent. Then S y? will equal x? as usually determined and will therefore be independent 


of the order of breaking up of the multinomial and will be distributed as x? for (m — 1) degrees 
of freedom. 

Proof. The first statement follows from the remark at the beginning of this section. That 
the y,’s are uncorrelated follows from the fact that at any stage the residual multinomial 
does not depend on the factors a,, ag, etc., already isolated in determining the individual 
x-terms. 

Asa oo the y; as defined will tend to normality in virtue of the properties of the binomial. 
Since they are uncorrelated, they will then be independent. P(a; | p;,4) has now been reduced 
to the product of (x — 1) binomial fo each of which may be reduced by Stirling’s 
approximation to the form 


37) oXP (— 4X7) (6) 
= 7) 
where oF oe (a, +@,_,+4 oe FE g_j) Pu Pat Part - + Pu- -i+1)/( Put Pa- it a er 
1 
and X= ae (@, + ¥ dy at.. -+4@,- -i ) Pa- -t o (7) 
\ Pn +P, ss FD -é J 


5. This exact partition can be shown to be equivalent to finding (n—1) uncorrelated 
variables from Karl Pearson’s (1900) correlation matrix, when a is indefinitely large, by 
using the methods of multiple correlation. We may use the notation of multiple correlation 
and take n correlated variables w; = (ap; —a;)/,/{ap,(1 — p,)}, each of which has an expectation 
zero and unit variance. Then the total and partial correlation coefficients, the variances and 
regression coefficients are given 


rig = —V{pip;l(.— p,) 1 — p,)}, (8) 
Vij teem = —VCPePs/{1 — pi — (Pet Prt + Pa) LL — Dj — (Pet Mt + + Pdf (9) 
og; = 1, (10) 

oF; = (1-(p tp} —p,) (1—p)), (11) 

OF. jk.em = (1 —( ppt Pj t «+» + Pm )}/(l — py) {L — (Dj + Pet + Pad} (12) 
Bij ta...m = — VLC = Pj) PePyl(L— PA) — D5 — (Dict Prt + + Pnd$?*- (13) 


It follows from the theory of normal correlation that w,/o,, wy_,/o2.;, ee etc. are all 
asymptotically normally distributed with zero mean and unit variance and that they are 
uncorrelated, and further it is easy to see that w, 19 .(,-y = 0, so that the first (m — 1) of 
these standardized variables form a series of (n— 1) y’s and so give a partition of y? for the 
multinomial. It is easily verified that they give a series identical with our exact partition 
of x*. This analogy can be extended to the case of the general contingency table. 











120 Partition of x? in certain discrete distributions 
MANIFOLD CONTINGENCY TABLES TREATED BY x” BY AN 
EXTENSION OF THE FOURFOLD TABLE 


6. Yule (1922) suggested that (r—1)(s—1) independent comparisons could be made in 
the r x s contingency table but did not give a proof that each corresponded to an independent 
value of x”. It is the purpose of this section to show that every r x s table may be reduced to 
(r — 1) (s—1) fourfold tables; the value of x in each fourfold table will be independent of the 
other tables; these x? will be summed to give x? for (r — 1) (s— 1) degrees of freedom; finally, 
we shall prove that this value of x? is unique and equal to that derived by the formula 
(observed — expected)? + expected, summed for every cell of the r x s table. 

7. We shall first prove that a 2 x 3 table can be reduced to two fourfold tables and a 3 x 3 
table can be reduced to four fourfold tables. We shall then extend this by induction to r x s 
tables. 


We take as the null hypothesis that there is no association between the probability that 
an observation should fall in any row and in any column, i.e. 


Piz = Pi.P.;- 


It is easily shown that the probability of the 2x 2 table may be reduced by means of 
Stirling’s approximation, so that 


rm «ee p{-5 cee 
Plais | 4404.3) ~ rent | oem @ 14. ” 2 GQ Gy A 1G 9)’ 04) 


where A = 4,,-4@, a ,/a. 


The case of the 2 x 3 table 
It follows that when there is no association we may write 


P(a;;| a, p;;) = ati (24) (¢ = 1,2; 7 = 1, 2,3) 
et i 
a! s a! . a, !a, !a ,!a ,!a 5! 
“a. la! Pir pot x ala ,! i PSPS * Tay ) (15) 


Thus the row and column totals are sufficient statistics for p; and p ; adnan. Further 


we have P(a;;|@, p;;) = Pla;,| a, p;.). Pla, 3|@, p;)- PC a;;|;_,@_ 5). 


a, !dy!a_,!a ,!a 
Hence P(a;;|a;,@,;) =—>— alite.! r 3! (16) 
a; 
i,j 
and, summing over all a;,, E P(a;;|;,,4,;) = 1. 
By a rearrangement, 
P Rig! Rog! y!a.2! Te! !ag,!a 3! 
(45 |4;505)=7 TAR | Reoo!@re!Gee!a!? il 
yy! Qyq! Gy! Age! Taq! Rye! Rog! ays! Qg3:4° 


the two terms on the right-hand side being the probabilities corresponding to the fourfold 
tables 
@, 2 | Ry Rye yg | A, 


Gq, Ase 





Roe Roz gg | Ge, (18) 





ai a, | To. Ty, 3 |.a 














If xX) ; 
tion, ' 
fixed | 
may | 
is x” ¢ 


whic 


Iti 


fou 


for 


of 


4) 


3) 


5) 











H. O. LANCASTER 121 


If x1, X_ are standardized normal deviates derived from these tables by Stirling’s approxima- 
tion, we see that x, depends only on redistribution of observations in the first table with 
fixed marginal totals and hence is independent of x, in the second table. Then x? = x3?+ x3 
may be calculated and has 2 D.F. When we come to the general case we shall prove that this 
is x? as usually calculated within the limits of our approximation (Stirling). 


The case of the 3 x 3 table 





I @;.! I14@.;! 
P(a@,,|a; ,@ ,) = $___j _-s where i= 1,2,3;j =1,2,3 
is | t. 7) a! J[a;;! J 
7 


’ 
— Rye! Roe!Co! Cog! Tag! Cog!ay ! 49! 
41! yo! @q1! Goo! Tyo! Ryo! Rog! A435! Go! Th! 





Taq! Rug! 1! 9! Tyg! 3!Trs!ag ! 











x — 82 * => (19) 
Cg! Cog! @g,! gq! Tye! Toe! Cog! Rye! G35! a! 

which may be represented in a schema as follows: 

Qy, Aye | Ry Rip Gy3 | a 

Gz, Aq | Ro» Ryy gg | Ae, 

2] a] | 

Cy Coz | Tre Top Cog | Tas 

Y Y a 

Cx Cop | Tre Too Cx3 | Tes 

Ay, Aq | Ryo Rsz gg | Gs 

A, 49| Tye Ts: 43|4 (20) 





It is easy to see that a similar extension is possible to r x s tables. We see above that we have 
t=4 

four independent values y,, Xo, X¥3, X; and x? = Dd x7 with 4v.F, 
t=1 


The general case of r x s tables 


Theorem. Any r xs table can be reduced to (r—1)(s—1) independent fourfold tables, in 
(r—1) (s—1) 
each of which we can derive a value x;. Further > is asymptotically equal to x? as 


— 


t=1 
usually calculated and is unique and equal to (observed — expected)? + expected, summed 
for every cell of the table. 

The proof is by induction. We suppose it true for rxs tables and show it true for 
(r+ 1) xs tables: 


j=s = j=8 
a;;!7,,! I1 C,,;! I G,41,;!a! 
j= 





(21) 











122 Partition of x? in certain discrete distributions 


The first factor can be broken up into (r — 1) (s — 1) independent fourfold tables by hypothesis, 
and the second can be broken up into (s— 1) according to the scheme 














Cn Cr T 2 Ts Cs Ts 
Gr+1,1 sre Ry 41, 2 Ry41, 2 4113 | Reaas 
| 
a, aie | T 41,2 T+i1,2 4.3 | T,+1,3 
and so on to Moet Sea tae 
Rosas Dyits By41,. (22) 








T, 


r+1,s—1 ais 
Hence the (r+1)x<s table can be broken up into (r—1)(s—1)+(s—1) = r(s—1) fourfold 
tables. Therefore, if the first part of the theorem is true for r rows it is true for r+1 and 
similarly for columns. Therefore this holds generally, since we have proved it for 2 x 3 and 
3x 3 tables. 

Thus we have proved that P(a;;|a;,a ;) is equal to the product of terms of form 


P(u;;|u;,%,;), where u;; are the four frequencies in the cells of a certain fourfold table. 
We have seen that 





a 





us ey 
P(uj;|U;,4,3) = Tem oxP (— xe), (23) 


Uy Ue Ws 

and this term can be equated to a certain integral between certain limits and that finally as 
an approximation we may say that x, is normally distributed. Since in each fourfold table 
the row and column totals are fixed, each y,, is independent of the y of every other fourfold 
table in the set because the mean value of each y,, is zero for given marginal totals; hence the 
proof follows as in the case of the multinomial. 

Since the y are uncorrelated and in larger samples may be taken to be normally distributed, 
they are also independent. Thus Xx? = y? for (r— 1) (s— 1) degrees of freedom. 

But Stirling’s approximation applied direct to the P(a;;\a;,a;) by means of the sub- 
stitution a;; = a, a ,/a+&;,, gives : 

Pla; | a; , a ;) = (2-1-0 ars 


a a ae 
II (a;.)°* Il cog “=? * bh Sija/(as.4.5) 
. j 





= Kyexp| - 5 Eeyal(a.a.,)) (24) 


where K, is of order {(r—1)(s—1)}-+ in the a;;. We note further that x% is of O(a;,;) and 
£,a/(a; a, ;) also of O(a,,;). 
It now suffices to equate 


Kyexp (42.8) = Keexp{-1 2 s4a/(a,..9) (25) 
u 2 
to obtain an identity true within the limits of the approximation, namely, 
Ux = LD ¥a/(a; a, ;). (26) 
u 7 


Analysis of x? in manifold tables 


8. By the substitutions ay; = ap, +&i; 


on the left-hand side and 


on t 
equa 


to tl 


whe 


sis, 


22) 


old 
nd 
nd 


rm 
le. 


23) 
as 
ble 
old 
the 


ed, 


ab- 


24) 


und 


25) 


26) 





— 


H. 0. LANCASTER 123 


on the right-hand side and then the use of Stirling’s approximation, we may reduce the 


equation P(a,;| Pij,4) = P(a;_| p; 4) P(a_;| p 5,4) Plays |a;,@. 5) 
tothe form © K,exp(—4x*) = K,exp(—4y2)exp(—4x2)exp(— x2), _ 27) 
where K;, K, are constants of the same dimensions in the a;;. 

Xr = X* due to rows = (a; —ap; )*/(ap; ), 

x2 = x? due to columns = X(a_;—ap_;)*/(ap 5), 


28 
Xz = x* due to ‘interaction or association’ = aX(a;;—a; a_,/a)?/(a; a. ;), ” 


x? = x? due to all causes of variation = X(a;;—ap; p_;)*/(ap;_p_;)- 

From (27) it follows that x? = x24 242% 
with corresponding degrees of freedom 

(rs—1) = (r—1) + (s—1) + (7-1) (8-1). (29) 
We see now why the degrees of freedom are reduced when the p; and p_; are estimated from 
the data since, if p; and p ; are estimated as a, /a and a_,/a respectively, x? and x? both 
vanish. P(a,;|@; ,@_;) is independent of the p; and p_; and hence x? is minimized by choosing 
p;, and p_; so that x? and x? vanish. This is also the maximum likelihood solution. 


The exact partition of x” for manifold contingency tables 
9. The partitions of x? which have been obtained are only approximate, tending to 
exactness as the size of the sample is indefinitely increased. We now put Dr Irwin’s solution 
(pp. 130-4 below) for the exact partition of x? into matrix form, giving a transformation of 
the standardized variables arranged in an (r x s) matrix. to an (r x s) matrix which has the 
elements of the first column and of the first-row zeroes. 
Helmert matrices for rows (r x r) and columns (s x s) using p; , p_;, are constructed thus: 


VP. VPo. +++ NV Pr, VP 1 VP.2 ae) 
Pe. Pi. etc. 
FF = R= i coat Pees eee eee : C;; =C= 30 
(ri) : Pi.t+ Po. Pi.+ Pe. (eis) (89) 
etc. a 
P a;;—@p;; 

We also write w= O= (“4e2Pa) : 31 
(93) Q J (ap;;) ( ) 


We again take as our null hypothesis, p,; = p; _p_;. 


Consider the orthogonal transformation RQ which gives rise to an (rxs) matrix of 
standardized variables. 


Then Q'R’ is an (s x r) matrix of standardized variables. 
CQ’ R’ gives rise to an (s x r) matrix of standardized variables. Hence (C(RQ)’)’ is an (r x 8) 


matrix of standardized variables, ie. RQC’ = E = (e;,;) is an r xs set of standardized vari- 
ables. 


We can now prove that RQC’ has zeroes in the first column and first row if the data are 


used to estimate p; =a; /a, p_; = @_;/a, and conversely, that to make the elements of the 
first column and row zero we must estimate p; and p ; by these two formulae. For 


ey, = Dre dearer = 2 rie Geren = Zv( Pe.) (Gia— Px Pa) V(P.2)/V(apyy) = 0, (32) 


4; = DV (Pe.) (Aur Pe. Pa) Cyl (@Pa) = Beq(2.1— 4p dV (ap 2) =0, if p,=a_ja. (33) 











124 Partition of x? in ceriain discrete distributions 


Similarly é,=0 if p, =a; /a. (34) 
Conversly if all e,; = 0 we have 
0=¢;= Zea Pr.) (Ae OP. P )/V (Pa), (35) 
0= Lents — ap ,)/\(Pj) = Mme, (36) 
where m, = (a.,—ap ,)//(P 2). 


But this is a set of s equations in s variables and C +0, so that m, = 0 or p , = @_,/a. The 
matrix E is thus of form 


s columns 
ro 0 0 0 07 
O Coo Cog 


© BOWS. 1.0. Cee Cag «se use (37) 








Pe) 3c. Seog 


The first transformation RQ ensures that the members of any given column are uncorrelated. 
The second transformation RQC’ ensures that the members of any row are uncorrelated. 
Thus all e;; not zero are uncorrelated. We have therefore a set of (r — 1) (s— 1) uncorrelated 
normal devon with mean zero and unit standard deviation. Further, 


x* = LY (observed — expected)? + expected 


= Dey, (¢ = 2,3,...,2; 5 = 2,8, ...,8), (38) 
a7 


where p; =a; /a, p,;=a_,/a have been estimated from the data. This is in effect the 
theorem on p. 291 of the article of J. Neyman & E. S. Pearson (1928). 


10. The author’s solution of the (r x s) contingency table consists essentially of using a 
double Helmert transformation and testing out successive fourfold tables for ‘interaction’, 
or ‘association’ by means of estimates of p; and p_; derived from local row and column 
totals, instead of those derived from the marginal totals of the whole table. Any comparison 
involving row or column totals in these fourfold tables is set aside to be considered in the 
adjoining table. 

If we use the marginal totals of the whole table, the partition can be made exact as is 
shown in the proof above. A numerical example is given later which illustrates these points 
in detail. 


11. In the special case of 2 x 2 table we have 


Gis = (443-4 ;;)/J(@P;;), Pig = PiP.5- (39) 
Transform (q;;) as follows: 


| VP\. a4 ele one VP VP _ 
— Pe. VPi qai VP. 2 VP 


-[_ 911 VP + Gor VP or + M12 V Piz + Y22 V Poe» ee | (40) 
Gir VP or + Yor VP 11 — V2 VP 22 + Yo2 V Pie: ir VP 22 — Yor VPie — Via VP 21 + Goo VP 


= matrix of x's 


(41) 


Total columns 
rows ‘interaction’ 








Sit 


34) 


35) 


36) 


‘he 


37) 


9) 








H. O. LANCASTER 125 


In this case x (total) = 0, since we made a = }¢;;, 
i,j 


x (rows) and x(columns) become zero if we take p; = a; /a, p_; = a_,/a respectively. 


We are reduced to the single ‘interaction’ term for 1 p.F. Alternatively we may use the 
matrix to find p, , p_; such as to render yx (rows) and (columns) zero. The solutions will be 


P;, = 4; /a, Pj = a _;/a respectively. 


EXAMPLES OF THE PARTITION OF x? BY SEVERAL METHODS 


Example 1 


There were available data from a random sampling experiment which had been carried out 
for another purpose, comprising the totals of samples from Poisson populations with means 
15 and 30. These were totalled for consecutive sets of 200 and 100 drawings respectively. 
These data are used because we are able to give theoretical values independent of the observed 
frequencies to the p; and p ;, and the frequencies are sufficiently large to illustrate the 
approximation of the different sets of x? calculated by the different methods. 

We have analysed this table in three different ways. 

I(a) The first method treats each frequency as a member of a Poisson population with 
mean 26,974/9, that is, 2997-11 and s.p. ,/2997-11. 

i(b) We have calculated the standardized variables 9;; = 3(a,;— §@)/,/a and arranged 
them in the Q matrix. We have calculated the matrices for rows and columns, R and C of 
Helmert form, using the theoretical value of } for the p; and p_;. Thus in this special case 


-—=z 0 |. (42) 








Since the pre- and post-multiplication with such an orthogonal matrix leaves the sums of 
squares invariant, we obtain a matrix E such that > ¢?; = x*. e,, = 0 since we have used the 
i,3 


total number in the calculation of the expected in individual cells. 

II. We have computed the usual y? = (observed —expected)*+expected, using the 
marginal totals to estimate p,; and p_; for the calculation of the expected. With these same 
values for p,; and p ; we have computed the matrix of standardized variables Q and the 
Helmert matrices R and C. 

III. Fourfold tables have been made according to the method used in the proof of §7. 
x? has been computed using the marginal totals of each of these fourfold tables so that the 
estimate of p, differs from table A to table B. We have taken the square root of x* to obtain x 
and given it a sign according to the sign of (observed — expected) in the top left corner of the 
same table. It is then easily seen that these four values closely approximate to the four 
values of corresponding to ‘interaction’ in the other two y matrices, thus giving a numerical 
verification of the asymptotic equivalence of the three methods of computing. 





126 Partition of x? in certain . iscrete distributions 


The 3 x 3 contingency table (random sampling data) 





(p43 = 1/9 for all i, j. Expected total 27,000) 



































3009 2832 3008 | 8849 
3047 3051 2997 9095 
2974 3038 3018 9030 
+ ! 
9030 8921 9023 | 26974 
The 
Method I (a). Analysis of x? by usual method with p; = p_; = }. multi 
aT. athe a / sum ¢ 
D.F 2 Identification with likelil 
cia x E below zero. 
Me 
Rows 2 3-61467 Dei, mani 
Columns 2 0-82798 de?; \ 
Residual 4 7-42107 2, + eis + e&s + e3s ' 
+— ee ———__—_—_—— 
Total 8 11-86372 x e; 
| | tJ 
| 
Method I (b). | 
Q Matrix - 0-217165 — 3-015955 0-198899 
0-911281 0-984346 - 0-002020 | 
i. — 0-422153 0-746885 0-381561 
R=C= - 0-577350 0-577350 0-577350 
0-707107 — 0-707107 0 
L  0-408248 0-408248 — 0-816496 We 
RQ = - 0-407778 — 0-741735 0°333957 
— 0-490814 — 2-828641 0- 142078 | ' 
L. 0-805372 — 1-439229 — 0-231172 and 
E=RQC’= fr 0 — 0-812829 — 0-409012 
— 1-834459 1-653094 1471107 | 
L. — 0-499424 — 1-587173 — 0-070020. 
: , . ; Th 
It will be noted that in matrix RQ the sum of the squares of the elements of the first rowis | 
~ a 
equal to the y* in the table for columns. PI 
Method II. The value of x? = &(x —m)?/m, when the expectations are computed from the 
marginal totals, is 7-54658. The Helmert matrices for rows and columns, when the p; and 
p.,; are estimated from marginal totals, are as follows: ter 
C= 0-578590 0-575088 0-578366 
| 0-704957 — 0709250 0 ] = 
0-410206 0-407723 — 0-815777 an 
R= 0572762 0-580669 0-57859% in 
| 0-711937 — 0-702243 0 2 
0-406311 0-411920 — 0-815618 
Q= - 0-857076 — 1-748555 0-881236 \ W 
| 0-041607 0-784907 - 0-s22089 | ‘ 
— 0-890199 0-943215 — 0-047324 i 


Sum of squares = tabular x? = 7-54658. 











H. O. LANCASTER 127 


RQ = 0 0 0 
[ 0-580966 — 1-796056 1204686 | 
1-091440 — 1-156441 0-058022. 
Sum of squares = 7-54657 = x*. 
E = RQC’ = 0 0 0 
[ 0- 1-683409 _ 1.47073 | 
0 1-589624 — v-07112 
Le3; = the sum of squares = 7-54657 = x’. 
aj 


The transformation R applied to Q treats each column as though it were a separate 
multinomial. The two successive transformations of Q are both orthogonal and so leave the 
sum of squares invariant. The estimates of p; and p_,; are efficient (i.e. those of maximum 
likelihood) and so make the marginal elements corresponding to the row and column totals 
zero. 

Method III. The formation of four fourfold tables (from which x? is calculated in the usual 
manner). This partition is only exact asymptotically. 




















A B 
3009 2932 | 5841 5841 3008 8849 
3047 3051 6098 6098 2997 9095 
6056 5883 | 11939 11939 6005 17944 
Cc D 
6056 5883 11939 11939 6005 17944 
2974 3038 6012 6012 3018 9030 
9030 8921 | 17951 17951 9023 26974 


We may write for convenience in the form of a matrix 
x= foram saaeal 


2-52637 0-00506 
and corresponding to these values of 


XY=[1-691 —1-477 
de et 
The partitions of the x? of the contingency table by the above three methods evidently give 
approximately the same results. I and II are exact on the hypothesis made. 


Example 2 

The partition of x? is useful in the following type of case which arises frequently in bac- 
teriology. 

Measured constant amounts of a liquid suspension of a bacterial culture are mixed with 
an equal quantity of disinfectant solution of known concentration, and a plate is ‘poured’ 
and the number of colonies developing are noted. For each plate the concentration of dis- 
infectant used is given by some series such as 1,r,r?,r°,..., where r is some factor such as 
2 or 1-5. In such a case the following results might be obtained. 

Number of colonies (a;) developing in successive plates 427, 440, 494, 422, 409, 310, 302. 
We are interested in finding the point at which the disinfectant began to inhibit growth. 
It is convenient now to write the cumulated sums 

427, 867, 1361, 1783, 2192, 2502, 2804, 











128 Partition of x? in certain discrete distributions 


and the differences {a, +a,+ ...+a,_,—(k—1)a,}, —13, —121, 95, 147, 642, 690. Then we 
may illustrate the partition of x? by means of a table where the successive x? of the exact 
partition are given by 
Xk = Na, + Gq +... +,_,— (k—1)a,}?/{ak(k — 1)}, 
and the x? of the binomial partition are given by 
Xk = {a ++... +a,_,—(k—1)a,}2/(a, +a_4 ... +a,) (K-1). 
The total y? may be obtained by the usual formula 
x? = X(x—%)2/¥ = nia?/Tae— ix 
= 73-474 for 6D.F. 











Binomial Exact Value 
partition partition of k 
0-195 0-211 3 
5-379 6-092 3 
1-687 1-877 4 
2-465 2-697 5 
32-947 34-298 6 
28-299 28-299 7 
70-973 73-474 

















The comparison in each case is that between plate k and the plates preceding it. In those 
cases where the null hypothesis is not true the discrepancy between the two methods will 
be high. Gumbel (1943) has criticized the ? test in cases where the data are grouped, but 


from the point of view of the exact partition of y* given above his argument appears to have 
no force. 





Example 3 
Consider the following fourfold table (Roberts, Dawson & Madden, 1939, last line, p. 60): 
P Pp 
B 942 900 | 1842 
b 956 936 | 1892 
1898 1836 | 3734 


By the use of an expectation in each class of 933-5, we find x? = 1-82860. The partition is: 


————_————— 





Exact partition Using 
using theoretical ratio observed data 
b= —<—— $$ ——$$ ——-—— 
Rows 0-66952 0 
Columns 1-02946 0 
Interaction 0-12962 0-13965 








Total 1-82860 (3 D.F.) 0-13965 (1 D.F.) 























The sa 
standar 


or usin 











H. O. LANCASTER 129 


The same results would have been obtained had we made the transformation of the 
standardized variables by means of 


a , j 7 ee. 
J2. 2 Qu =z J2 J2 
1 1 1 1 
~ Ya 2 _ Gee Jj2 2 


or using the p; p_; from data 


VP. VPs.) du U2 | PVP. —vP.s 
—JPe. VP. | dan = NP.2 tad 








I have to thank Prof. A. Bradford Hill, Department of Medical Statistics of the London 
School of Hygiene, for the facilities of his department. It is also a pleasure to record the 
assistance given by Dr J. O. Irwin and Mr P. Armitage of the same department in helping 


to clear up doubtful points. The work was completed while the author was a Rockefeller 
Fellow in Medicine. 


REFERENCES 


FisHER, R. A. (1922). On the interpretation of x? from contingency tables and the calculation of P. 
J. R. Statist. Soc. 85, 87. 


Fisuer, R. A. (1924). The conditions under which x? measures the discrepancy between observation 
and hypothesis. J. R. Statist. Soc. 87, 442. 

Fisuer, R. A. (1944). Statistical Methods for Research Workers, 9th ed. revised. Edinburgh: Oliver 
and Boyd Ltd. 

GuMBEL, E. J. (1943). On the reliability of the classical x?-test. Ann. Math. Statist. 14, 253. 

Irwin, J. O. (1949). A note on the subdivision of x? into components. Biometrika, 36, 130. 

Nryman, J. & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes 
of statistical inference. Biometrika, 20A, 175, 263. 

Pearson, K. (1900). On a criterion that a given system of deviations from the probable in the case of 
a correlated system of variables is such that it can be reasonably supposed to have arisen in 
random sampling. Phil. Mag. (5), 50, 157. 

Roserts, E., Dawson, W. M. & MappENn, MARGERET (1939). Observed and theoretical ratios in 
Mendelian inheritance. Biometrika, 31, 56. 

Yuus, G. U. (1922). On the application of the x? method to association and contingency tables, with 
experimental illustrations. J. R. Statist. Soc. 85, 95. 


Biometrika 36 








[ 130 ] 


A NOTE ON THE SUBDIVISION OF x? INTO COMPONENTS 
By J. O. IRWIN 


1. Some years ago (Irwin, 1942) I derived the xy? distribution for the weighted sum of 
squares of deviations from their weighted mean of n normally distributed variables, the 
weights being inversely proportional to the variances. 

By taking n linear functions of the original variables in which the coefficients are given by 
a matrix of Helmert’s type, Xw(z—)*? was partitioned exactly into the squares of (—1) 
independent transformed variables each with unit variance. The result was 





i i 2 
n n—1] Min x 0; >, W;%; 
4 = .. ted = 
2 Miley— 2) _ 2 i+l i — Visi : (1) 
xu, | dw, 
j=1 j=1 


If the original variables are normally distributed the x? distribution follows immediately. 
If a,,@,...,@, are n observed frequencies with expectation m,, mg, ...,m,, the familiar 
expression for x*, &(a—m)?/m may be analogously partitioned. 
Let m; = ap;, where a is the total frequency expected, which is the same as the total 
observed frequency, if sample size is held constant, and consider the standardized variables 


_ 44-2; 





r; = Jap.) (j = 1,2,...,2). (2) 
7 
Transform these by means of the matrix [L] = [1,;], where 
L;=J(p;) (fj = 1,2,...,2), ’ 


~ 
| 


( —1 i r . Z 
v=, |{pe] (= »)(z ») | (0 = 2,3,...,.m3 7 = 1,2,...,0—-1), 
t=1 t=1 


l,=0 (¢=2,3,...,.n;j =t+1,t4+2,...,m). 








It is easy to see that the matrix is orthogonal. 


The frequencies a; (¢ = 1, 2,...,m) are samples from a multinomial distribution. It is well 
known that this multinomial distribution can be regarded as a subdistribution of the n-fold 
distribution formed by n independent Poisson variates, when we introduce the restriction 
that the sample size a is to be held constant. 

Hence the variables x; in (2) can be defined to be independent standardized Poisson 
variates, and the restriction that a is to be kept constant can be dealt with later. 


The variates z; are independent, have zero means and unit variances, and are therefore 
uncorrelated. Since L is orthogonal the variates 


n 


u;,= Dye; (j= 1,2,...,0) 
j=1 








have als 


But 
that is, 


Hen 


or afte 


or 


i 
d 


























J. O. Irwin 131 
have also zero means and unit variances and are uncorrelated. Hence 
n 
Eu- Eg- 5 (aj— ap (4) 
i=1 ap; 
n are . n 
But uw, = 5 (Oeil — 0 if Ya,=a, 
i=l Ja j=1 
that is, if the sample size a is kept constant. 
n n 
Hence ruw= TG 
i=2 j=1 
or after a little reduction 
eis oe + Sst) 
5 (a;—ap;)* _ is \PitPet---+Pj Pjir 
j=1 = OD; an 1 1 ‘ook 
PitPot---+P3 Pjir 
or ‘cece +a; — Se)’ 
(a; —< ap;)* _ n— 3 M+ Met ...+M; M54 (5) 
j=1 OD; p> Ye 
m, +m, + b...+M; M44 


The explicit form of the matrix [Z’] for four variables is shown in Table 1, where x) = %, 
X1 = Ug, etc. The transposition is purely for convenience in printing. 




















Table 1. Explicit form of the matrix L’ for four variables. 
Standardized variables 
Xo X1 Xs Xs 

%— GP, VP, d P2Py Af PsP / PaPr 

V(4P,) Pi(P1 + Pe) (P+ Pe) (Py, + Pet Ps) (P+ Pet Ps) (Pi + Pat Pst Pa) 
v_— GP, VPs . it - df or eee df PaPs 

V(@p-2) N( (py + Pe) (Py + Po) ( Pi + Pet Ps) (Pi t+ Pet Ps) (Pit Pat Pst Pa) 
Ly— Py dn 0 __ PitPs PsPs 

(aps) (Pp: + P2+ Ps) N (P+ P2t+Ps) (Pit Pat Pst Pa) 
a in 0 0 he wl Pi+PatPs 

V (ap) Pit Pat Pst 








2. Nowitis known that if X = [x,,] (¢ = 1, 2, ...,7;7 = 1, 2, ...,8) is the matrix of variables 
in a two-way (r x s) analysis of variance, and if the sum of squares for rows can be split up 
into the squares of (r—1) linear components each corresponding to one degree of freedom 
and having a matrix of coefficients L, = [l;;], L, = [A,;] being the similar matrix for the 
‘columns’ sum of squares, then the matrix corresponding to the linear functions whose 
squares when summed form the interaction sum of squares is Q = (L,X Lj). Here L, is a 
(r—1,r) matrix, X is (r x s) and L} is (s, s—1), so that Q is (r—1,s—1), as it should be. 


L, bs ! Ay Ay A 
L -[z 12 a! L -(3" 12 af 
; ly leg ; As, Ag Ag 


For example, if 











132 Subdivision of x? into components 





and x is a (3x3) matrix, Q is a (2x 2) matrix of linear functions of the. nine z’s whose ] 


coefficients are L, 
hy An hi Aw Li Ags by Ae hy Ag hy Acs 


lie Au LeArs lie Ais he An * Age lie Ass 


hsAu lisAts hsAi3 ligAer ‘1g Ag9 ligAgs ‘ 





bay An ley Ais ley Aus ley An Ly Ace ley Ass | 
lee Ay 1 lee A 12 log Aas lee An lee Ace Io Ass ] 
legAqy leg Ax2 leg A43 log Aa leg Age leg Ass ; 





By choosing L,, L, to be matrices of the type given by (3) and X to be the matrix whose 
elements are given in (7) below, it follows that the interaction sum of squares is 





r—1 s—1 
>> D> ui;, (6) 
i=] j=1 
where Wis = {(Lin) X[Azs]} on 
and (1;,) is a row vector and [A;,;] a column vector, or 
App —@ 
Us = (Liz) "aek eae) [Avs], 
and this can be shown to reduce to My 
a : 
(Liz) Fox ke = |e Kil: (6 bis) the 
Now let us consider an (r x s) contingency table where the observed and true expected fre- to 
quencies are respectively 
aj5, api; (= 1,2,...%j= i eee 
and choose standardized variables 
a;;—ap; 
ne: iin a (7) 
” J (ap;) - 
_ %.— 4p; —ap. 
Let a, =——t, 2,=— +i 8 
i oO " 
be standardized variables corresponding to the two sets of marginal frequencies. Then since 
% = 0, the interaction sum of squares is given by 
r 8 2 r : 8s 3 T 
p> yj 2. Xx; (9) 
i=1 j=1 j=1 
r r — 2 
and > 2. = (4, — ap.) _ x? for rows, 
i=1 i=1 8 9— OD; 
(10) 
: $, (a. ;—ap_;)* 3 
2, = > —— = x’ for columns. 
j=1 j=1 8= DD; - 


If we substitute for p, , p_; respectively, the estimates a; /a,a_,/a, then the sums of squares 
in (10) vanish and (9) gives the usual x* expression for the contingency table. But the expres- 
sions obtained in (6) and (9) are equal, hence x? can be partitioned as in (6). 


10s8e 


hose 


(6) 


| bis) 


(7) 


(8) 


ince 


(9) 


(10) 


ares 
ores - 





J. O. IRWIN 133 


For example, for a (3 x 3) table the four sets of coefficients are obtained by taking for L,, 
L, on p. 131 the matrices L(p, , pz, Ps.), L(P.1, P.2 P.s), where 





J P2Pi oo Pi 0 
Pi(Pi+ Pe) (P1+P2) 
L(p,, Poa Ps) = Tat ' - . 








J PsP1 J PsP2 aa / _ Pit Pe _ 
(Pi + Pe) (Pi. +P2t Ps) (Pi + Pe) (Pit+ Pet Ps) Pit Pet P 


If in association with the first set we write for the expected frequencies 








ap; P. - 
O~ batpad(mte) I= hh 
Cy + Cig = ;., (11) 
C15 + Cay = © 55 | 
E = Xe 


ij? 


on applying this set of coefficients to the standardized frequencies we easily reach 


u=xX= Jle1e1.e 2) {011 gg + @o€ 11 — @19&2 — %a1 E19}; (12) 


which reduces to the usual expression for the fourfold table if the expected frequencies are 
calculated from the margins of the fourfold table itself and not as here from the margins of 
the original table. 
When the second set of coefficients are applied to the standardized frequencies, it is easy 
to verify that the result is the same as performing the following process: 
Form the fourfold table 
1 Tye 
+g, | +42 





a3) | Ase 
and note that the marginal and total expected frequencies are 


E( py. +P2.)/(Pi.+Ps.+Ps.) 
Eps |(P1.+P2.+ Ps.) 











{Ep _,/(p.1+P.2)} {Ep .2/(p.1+P.2)} | H=(p.1+7.2)(Pi.+P2.+Ps.) 


Then calculate x (= x.) from the formula (12). 
The process is general; in particular, in this example x, is similarly obtained from the table 


' 
Ay, + Ay | Ag 





Gq, +Ag9 | Aes 
and y, from Gy, + Gy. 


+g, +Qgq | +43 








@g, + Aq 33 








134 Subdivision of x? into components 


More generally for a (r x s) contingency table, if 


t t’ ¢ ¢ 
>» a; = bij; > > a; = bir, YX 2D Gz = Cy, 
i=1 j=1 i=i j=l 

the component fourfold tables are given by 


Kip = Kad wT t= ca: dead 
; bi sae Geet t’ = 1,2,...(8—1). 
If the x? from each of these tables be obtained from (12) the total will be equal to the x? of 
the whole contingency table. Dr Lancaster has given a formal proof of this. 

If any two orthogonal matrices of coefficients of orders r and s whose first rows are 
Vp; /P.;3 (6 = 1,2,...,r-1; 9 = 1,2,...,8—1) are applied to the marginal frequencies and 
then combined as indicated in this note, a partition of x? into the contributions from (r—1) 
(s—1) fourfold tables will result. Thus there must be many possible ways of subdivision of 
interest in particular cases, and the subject merits further exploration. 


(13) 


REFERENCE 


Irwin, J. O. (1942). On the distribution of a weighted estimate of variance and on analysis of variance 
in certain cases of unequal weighting. J. R. Statist. Soc. 105, 115. 














(13) 


x? of 


} are 
and 
—1) 
n. of 


ance 











[ 135 ] 


THE FiRST AND SECOND MOMENTS OF SOME PROBABILITY 
DISTRIBUTIONS ARISING FROM POINTS ON A LATTICE 
AND THEIR APPLICATION 


By P. V. KRISHNA IYER 
Department of the Design and Analysis of Scientific Experiment, University of Oxford 


1. IntTRODUCTION 


A lattice of points is defined to be a rectangular array of points in any number of dimensions. 
If each point is assigned at random to one of & colours, discussion of the numbers of joins 
between points of the same colour, or between points of different colours, will involve many 
interesting and important probability distributions. The author (1949a) has examined, for 
both free and non-free sampling, distributions of the number of joins parallel to the axes of 
the lattice. In free — the chance, p,, of a point taking the rth colour is fixed, and, 


subject to the condition : p, = 1, is independent of the colour of ar other points. In non- 
free sampling, fixed enliiet of points, say m,, Mo, ...,m,, such that Sa, = total number of 


points in the lattice, belong respectively to the colours black, white, ak ete. 

It has been shown in another communication (19496) that the results developed earlier 
(1949a) can be used to determine whether a given distribution of diseased plants in a rect- 
angular plantation can be considered to be random or not. This is done by comparing either 
(1) the number of joins between two adjacent diseased plants, or (2) the number of joins 
between healthy and diseased plants adjacent to each other, with its expectation for the 
observed numbers of diseased and healthy plants in the field. The procedure there recom- 
mended does not take the diagonal joins into consideration. Since the spread of disease is 
not restricted to the directions of the axes of the lattice, it is very desirable to have a test 
based on all the possible joins between adjacent points in the lattice. 

Todd (1940) suggested that the number of diseased ‘doublets’, ‘triplets’, or ‘quadruplets’ 
(that is to say sets of 2, 3 and 4 adjacent diseased plants) might be used as a basis of test of 
randomness. He included diagonal adjacency, and proposed to test the significance of the 
observed number from its expectation by assuming the distribution to be binomial. Finney 
(1947) pointed out that the true variances of these distributions might be much greater than 
for binomial distribution, because of the non-independence of the individual doublets, 
triplets and quadruplets. For the last two he produced numerical evidence that Todd’s 
procedure seriously underestimated the variance, so that the test in the form proposed by 
Todd would often give spurious indications of non-randomness. 

In the nomenclature of the present paper, the number of doublets is the number of joins 
between adjacent black points, and the mean and the variance of the distribution are given 
by equations (2-1-5) and (2-1-6) below. Rather surprisingly, the true variance is less than 
that calculated from the binomial assumption. This éan be seen by examining the difference 
between the formulae for the true and binomial variances. This is also true for joins parallel 
to the axes. 








136 Probability distributions arising from points on a lattice 


The table below compares the true and the binomial variances with those obtained in 
Finney’s sampling experiment for different numbers of diseased plants. 


Variances for the distribution of doubleis in a 10 x 10 lattice 











Variance 
No. of 
diseased plants 
True Binomial Finney’s sampling 
2 0-064 0-064 0-08 
5 0-620 0-643 0-88 
10 2-649 2-894 1-81 
16 5-697 6-753 4-90 
20 9-598 12-220 10°57 
50 36-141 718-789 —_— 




















The present paper gives the first and the second moments for the probability distributions 
of the number of joins, taken in all possible ways, between adjacent points which are of 
(1) the same colour, (2) two specified colours, and (3) different colours, for free and non-free 
sampling, in two- and n-dimensional lattices consisting of m xn and 1, x1, x ... x1, points 
respectively belonging to k colours. The results for free sampling in the case of two- and three- 
dimensional lattices have been given in Nature (Krishna Iyer, 1948). No attempt has been 
made here to discuss the higher moments and cumulants. 

It has been established (1949a) that the cumulants of the different distributions for the 
number of joins, taken along the axes of the lattice, are linear expressions in the number of 
points on the sides and in the lattice. That is, for an m xn rectangular lattice, the x’s are 
linear expressions in mn, m and n; for an 1, x1, x ... x l,, n-dimensional lattice they involve 
Gn, Gy_1) «++» Up in the first degree, where the a’s are symmetric functions in l’s as defined by 
MacMahon (1915). This is also true for the distributions considered in the present paper. 
Hence, for all the distributions, the y’s tend to zero when 1,, /,, ...,1,, approach infinity. That 
is, the distributions tend to the normal form when 1,, /, ...,1,, tend to infinity. 

The test proposed by Todd may now be modified, however, so as to use the variance formula 
of the present paper (equation (2-1-6)), and will then provide a sound test of the deviations 
from randomness of disease incidence amongst the plants of a rectangular plantation. This 
is illustrated in § 4. 

2. TWO-DIMENSIONAL LATTICE 


2-1. First and second moments for the distribution of black-black 
joins for two or more colours 

It has already been shown (1949a) that the distribution of black-black joins remains the 
same whatever be the number of colours in the lattice. 
(a) Free sampling 

It has been shown (1949a) that the rth factorial moment is r! times the sum of the expecta- 
tions of the different ways of obtaining r black-black joins in the lattice. 

Let an m x n lattice consist of mn points of two colours, say black and white, with pro- 
babilities p and g = 1 — p respectively. The expected number of black-black joins is given by 


My = Ayp’, (2+1-1) 





ST LT LET 











| in 


the 





eT SERRE Vp ere 





P. V. Krisnna [YER 137 
where A; is the number of joins in the lattice. By considering the different ways in which the 
mn points of the lattice can be associated with the points surrounding them (dealing separ- 
ately with the four corner points, the 2(m+n-—4) border points and the (m—2)(n—2) 
interior points), we see that 2A; can be expressed as 

30, -4+ 5C,.2(m +n—4) + gC, .(m—2)(n—2), 
so that Ay = 46—3a+2, (2-1-2) 
where 6 = mn, a = m+n. 

Two black-black joins can be obtained from (1) three adjacent black points, (2) four black 
points divided into two pairs. In (1) two black-black joins can be formed in B, ways, where 
By = gC. 4+ Cy. 2(m+n—4) + .C,.(m— 2) (n—2) 

= 4(7b—9a+11). 
The total number of ways of obtaining two joins is $4;(A;—1), and so the number in (2) 


is }{A3(A;—1)—2B}}. Since the chances of having three and four black points are p* and 
p* respectively, we have 


May = 2B, p* + {A3(A2 — 1) — 2B3} pt. (2-1-3) 
Adding ; and subtracting ;? we get ‘ 
ty = Ayp*+ 2Byp3— (Aj +2B;) pt. (2-1-4) 


(b) Non-free sampling 
As explained in the previous paper (1949a), the moments about the origin can be written 
down from the corresponding results for free sampling by substituting n{?/b™ for p’, where 
n, is the number of black points in the lattice and n® is written for n(n —1)...(m—r+1). 
This gives 
Puny.ny = Ayn? /b®, (2-1-5) 
Hainyny) = An /0® + 2BznP/0 + {A3(A— 1) —2B3} n/—{AgnP/O™*, (2-1-6) 
where /4n,, n,) denotes the rth moment about the mean for non-free sampling with n, blac! 
and n, white points. 


2-2. First and second moments for the distribution of black-white 
joins for two or more colours 
(a) Free sampling 
The chance of getting a black-white join is 2p, p,, where p, and p, are the probabilities of 
the points being black and white respectively. The number of joins in an m x n lattice being 
Ay (see § 2:1), the expected number of black-white joins is 
Hy = 2A5p, Dy. (2-2-1) 
Two black-white joins can be obtained (1) from three adjacent points (two black, one 
white or one black, two white), (2) from two pairs of points (with one black, one white in 
each pair). The expectations for (1) and (2) are 
By P, PoP, + Pa) 
and 2{A3(A,— 1)—2B3} pipi, 
respectively. Therefore 


May = 2Byp, Pol P: + Pa) + 4Ag(Ag— 1) — 2B3} pip}, (2-2-2) 








138 Probability distributions arising from points on a lattice 

and My = 2Agp, D2 + 2B, p; PoP + Pa) — 4(Ag + 2Bs) pip}, (2-2-3) 
which reduces to Pe = 2(A,+ By) py p.— 4(A,+ 2B) p?p3 (2-2-4) 
when there are only two colours in the lattice (since then p, + p, = 1). 

(b) Non-free sampling 


The moments about the origin can be got (1949a) by substituting nfn¥/b*+ for pj p§ in 


the corresponding moments in (a). This gives 
Ping ny = 2Agm 2/6, (2-2-5) 
Feinyng) = 2Agm, 2q/b® + 2Byn, n(n + ng — 2)/b© 
+ 4{As(45—1)—2B3}n® P/@—4{Arnng/b}, (2-2-6) 
For two colours (n, +, = 6) 
Hefny,nq = [2(A3 + By)/b® — 4{A3(A3 — 1) — 2B} (6 — 1)/b®] nn, 
— 4[{A, + 2.B3)/b — 2A37(2b — 3)/(B@b®)] n2n3. (2-2-7) 


2-3. First and second moments for the distribution of the total number of joins 
between points of different colours for three colours 
(a) Free sampling 
It can easily be seen that fy, = 2A, Xp, p,. (2-3-1) 
The coefficients of Xp, p, and Xp? p? (1 <r<s<3) in the second moment are respectively the 
same as those of p, p, and pip} in “4, for two colours. The term in p, p,p, arises from two joins 
formed out of (1) three adjacent points of three colours, and (2) four points of three colours 
divided into two groups, each having two adjacent points of different colours. The con- 
tributions of (1) and (2) above in the second factorial moment are 
6B,p, Peps and 8{A;(A,—1)—2B3}p, pops, 
respectively. Therefore the coefficient of p, p,p, in pj) is 
[6.B, + 8(A,?+ C4)], (2-3-2) 
where Cy, = — (A+ 28B;). Subtracting now the coefficient of p, p,p, in (})°, i.e. 4A3*(Xp,p,)? 
from (2-3-2) we get 2(4C,+3B3). 
Thus for three colours 


> 


Hg = 2(A, + By) Up, p,— 2(443 + 5B) p pep, — 4(As + 2B) Up? p?. (2-3-3) 
(6) Non-free sampling Hin, Ng, Ns) — 2A,2n,n,/b%, (2-3-4) 
efn,,ng,n3) = Pin,n, + Qn, n,n; + Rinini, (2-3-5) 


where P and R are the same as the coefficients of n,n, and njnj in fy, ») for two colours 
given in (2-2-7) and 
—Q = 2(44, + 5.B3)/b® — 12{A3(A3 — 1) — 2.B3}/b® — 8.437/{6(b — 1)}. 
2:4. First and second moments for the distribution of the total number of joins between 
points of different colours for four or more colours 

(a) Free sampling 

From the discussions in the previous paper (1949a) it can be seen that 

4, = 2A, 2p, Ps, (24-1) 

jt = 2(A,+ By) Zp, p,— 2(4A, + 5B3) Zp, p,p,— 4(A, + 2B;) (Zp? p3?-2Ep,p,p,p,]. (2°42) 





(b) . 


whe 


Ma 


(a 


(b 


2-2-3) 
2-2-4) 


pz in 


2-2-5) 


2-2-6) 


2-2-7) 


P. V. Krisuna [ver 139 
(b) Non-free sampling 
PAlnyngngny = 2AgUn,n,/b6®, (2-4-3) 
Hein, ng ngn) = PIN, n, + Qin, n,n, + RIinin§ + Sin, n,nn,,, (2-4-4) 
where P, Q and # are the same as given in § 2-3 and S = —2R. 


3. %-DIMENSIONAL LATTICE 

The discussions in § 2 show that the first and the second moments for all the distributions 
can be written down if the following quantities for the 1, x1, x... x1, lattice are known: 
(1) Aj, the number of joins, and (2) B,, the number of ways of forming two joins from three 
adjacent points. 

Now, Aj, for an l, xl, x ... xl, lattice can be celculated by extending the arguments given 
for the two-dimensional lattice leading to (2-1-2) and (2-1-3). 

At, = H{(3"— 1)ai, +2(2.3"-2—1)ai, 
+ 2°(2?.3"-2— la’, 4+... 4+2"(2"— lav}, 
Bi, = {(3"— 1) (3"—2) al, + 2(2.3"-1— 1) (2.3"-1_2)a’_, 
+ 22(22, 3-2 1) (22. 3"-2_ 2) a’,_ 4+... + 2"(2"— 1) (2"— 2) al}, 

where the a’’s* are symmetric functions in (/,—2), (l,—2), ..., (l,—2) as defined by Mac- 
Mahon (1915). 

In terms of the a’s in J,, l,, ..., l,,, A}, and Bi, reduce to 

A, e ${(3" —1)a,-2. 3"-Ma,_1 
+ 27. 3"-%a,_.+...+(—1)".2"a,}, 
Bi, = }{(3"— 1) (3"—2) a, — 2(3-*.5-3")a,_, 
+ 22(32"-4 52 3"-A)q, 4...4(—1)"2"(5"—3) ay}. 

We may now give the first and the second moments for the various distributions dealt 
with in § 2. 
3-1. Black-black joins for two or more colours 

My = An pi, (3-1-1) 
Hz = A, pit 2B, pi— (A), + 2B;) pi. (3-1-2) 


(a) Free sampling 


(b) Non-free sampling 
Pin ny = AnnP lad, (3-1-3) 
Han, ny) = Ann /a® + 2B% n®/a® + {A'(A’, — 1) —2B,} n/a {4' n®/a}2, (3-1-4) 


3-2. Black-white joins for k colours 
(a) Free sampling 


Hy = 2A, P1 Pro, (3-2-1) 
He = 2A), P,P. + 2B, p, Po( Py + P2) — 4(A;, + 2B) pi pi. (3-2-2) 

(b) Non-free sampling 
Hiinng = 2A,ny N/a? (3-2-3) 


Penn) = 2A}, n, N/a + 2 Bi, n, n(n, + Ny — 2)/a? 
+ 4{Ai(A), —1)— 2B} nPnP/ja®—4{A)n,n,/a®}2. (3-2-4) 
* af = D(—2) (4—2) (h— 2)... (2), 2 = 1. 








140 Probability distributions arising from points on a lattice 
3-3. Total number of joins between points of different colours for k colours 

(a) Free sampling 

My = 2A, =p, De, 


Hy = 2(A;, + B,) Zp, p,— 2(4A), + 5B,) Up, p,p,— 4(A}, + 2B;,) (Up? p?— 2p, p,p,p,,). (3-3-2) 
(6) Non-free sampling 


Hin, ne waa 2A}, in,n,/a®, 

Painyng....n) = [2(4,+ By)/ay — 4{4;,(A;, — 1) — 2B;} (a, —1)/a®)] En,n, 
— [2(44;, + 5B,)/ay) — 12{45,(A), — 1) — 2B, }/a? — 84) 2/{a®(a, — 1)}] En,nym 
— 4[(A;, + 2B,)/ay — 24;,°(2a, — 3)/(aPa)) [En2n2—2En,n,nn,]. (3-3-4) 


(3-3-3) 


4. APPLICATION 


The distributions discussed in the present paper suggest that a test of significance for the 
random distribution of diseased plants in a rectangular plantation can be made by finding 
the standardized deviate for (1) the number of joins between adjacent healthy and diseased 
plants, and (2) the number of joins between adjacent diseased plants. (2) corresponds to 


x x 


x x x x 
x 
«K x x 
x x x x x 
x x x x 
x x x x x 
x x x x 
x x x x x - x 
x x x x x x x 
x * x x x x x x 
x x 
x x x x 
x 
x x x 
Fig. 1 


Todd’s method with the modification proposed in the introduction. The two methods are 
illustrated below for a field consisting of 20 x 15 plants, 60 of which are diseased and dis- 
tributed in the manner shown in Fig. 1. 

The numbers of diseased-healthy and diseased-diseased joins in the above configuration 
are 365 and 42 respectively. Their expected values and variances obtained by using the 





(3-3-1) 


—s 











P. V. Krisuna IvER 141 


expressions (2-2-5), (2-2-6), (2-1-5) and (2-1-6), are 352-21 and 139-56, and 43-29 and 30-58 
respectively. Thus the standardized deviates for the diseased-healthy and diseased-diseased 
joins are 
365 — 352-21 42 — 43-29 
a ee 
respectively. This shows that the distribution of the diseased plants can be taken to be 
random. 





5. SUMMARY 


The paper gives the first and the second moments for the distribution of the number of joins 
between adjacent points of (1) the same colour, (2) two different specified colours, and 
(3) different colours, for m xn and I, x1, x ... xl, lattices of points of & colours, for free and 
non-free sampling. The joins considered include all the possible ones between adjacent points, 
including the diagonals. All the distributions tend to the normal form when J,, /,, ...,1,, tend 
to infinity. These results have been applied for testing the departure from randomness of 
a given distribution of diseased plants in a rectangular plantation. 


The author’s thanks are due to Dr D. J. Finney and Dr John Wishart for going through 
the manuscript and for making very useful suggestions for improving the paper. 


REFERENCES 


Finney, D. J. (1947). The significance of associations in square-point lattice. J. R. Statist. Soc. Suppl. 
9, 99. 

KrisHNA Iyer, P. V. (1948). Random association of points on a lattice. Nature, Lond., 162, 333. 

KrisHna Iver, P. V. (1949a). The theory of probability distributions of points on a lattice. Ann. 
Math. Statist. (in press). 

KrisHna Iver, P. V. (19496). Random association of points on a lattice. J. Indiar Soc. Agric. 
Statist. (in press). 

MacManoyn, P. A. (1915). Combinatory lysis, 1. Cambridge University Press. 

Topp, H. (1940). Note on random associations in a square-point lattice. J. R. Statist. Soc. Suppl. 
7, 78. 











[ 142 ] 


PROBABILITY TABLES FOR THE RANGE 
By E. J. GUMBEL, New York 


In the following, the asymptotic distributions of the range and the midrange for any un- 
limited symmetrical distribution of the exponential type will be compared with Elfving’s 
distribution of the probability transformation of the normal range. The asymptotic pro- 
bability and distribution of the reduced range are given to five significant decimal places, 
the reduced range being a linear function of the range proper. 


1. ELFVING’S DISTRIBUTION 


Elfving (1947) has given the following approach to the distribution of the range: Let z, 
and z, be the smallest and the largest among n observations taken from a known sym- 
metrical distribution ¢(x), and let ®(x) be the probability of a value equal to or less than z. 
The introduction of two new variates £ and » defined by 


§ = 2n./[P(x,)(1-O(z,))], 7 = }1g[®(a,)/(1—-O(,))] (1) 
leads to the joint asymptotic distribution f,(£, 7), where 
Fi(E.0) = 38 e-$ 009. (2) 
Integration over 7 yields the distribution of £ _ 
f(E) = EKg(E), (3) 


where K, is a Bessel function in the designation of the British Association Mathematical 
Tables (1937). The author shows for the normal distribution that £ converges in probability to 

§ = 2n(1—®(}w)), (4) 
where W=2,—-2, (5) 
is the range. Thus formula (3) is the distribution of the probability integral transformation 
of the normal range. The tabies for the distribution f(¢) and the probability F(é) may be used 


to check an observed distribution of ranges provided that the analytical form of the initial 
distribution ¢(x), its parameters and the sample size n are known. 


2. THE ASYMPTOTIC DISTRIBUTION OF THE RANGE 


Instead of the probability integral transformation £ we now consider the range w proper. 
Let ¢(x) be any symmetrical unlimited distribution of the exponential type, let u be the 
expected largest value defined by 

@(u) = 1—1/n, (6) 
and let a coefficient a be defined by 

a = nou); (7) 

finally let the quantity R defined by the linear transformation 

R = a(w—2u) (8) 


be termed the reduced range. If the sample size n is sufficiently large, the extremes x, and z,, 
are independent (Gumbel, 1946), and the distribution 


¥(R) = F(R) (9) 





of the 
distril 
being 


This « 


whicl 
This 

distr 
distr 
the vu 





E. J. GuMBEL 143 


of the reduced range is obtained (Gumbel, 1947 a,b) from the convolution of the limiting 
distributions of the largest and of the smallest values given by Fisher & Tippett (1928) as 
being 


+a 
y(R) = e*| exp[—e¥—e¥-®] dy. (10) 
This distribution is subject to Bessel’s equation 
y"(R) +y'(R)—e-*® y(R) = 0, (11) 
which leads to yr(R) = 2e-® K,(2e4”). (i2) 


This distribution is clearly asymptotic since it does not contain the sample size n, and it is 
distribution free, except for the conditions imposed on (x), since any trace of the initial 
distribution ¢(x) has disappeared. The distribution f,(w) of the range w itself is obtained by 
the usual procedure, as 
Faw) = a(R). (13) 
The two parameters a and wu in (8) and (13) depend upon the initial distribution and the 
sample size. Since the mean reduced range Risthe range of the reduced means, and since the 
variance o% of the reduced range is the sum of the variances of the reduced extremes, we have 


R=2y, o% = 4n*. (14) 
Thus the two parameters a and u may be estimated from the observed mean range and the 
observed standard deviation of the range with the aid of (8) and (14). 


The generating function G,(t) of the reduced range (Gumbel, 1944) obtained from the 
generating function of the extremes is 


Galt) = T%1-2), (15) 
and the two betas are A, = 064928, £,—3 = 1-2, (16) 


ie. one-half of the corresponding numbers for the largest value given by Fisher & Tippett 
(1928). These values allow us to complete the f,, £, diagram traced by Pearson (1926), for 
normal extremes and normal ranges for 2<n< 1000. Fig. 1 shows that the asymptotic 
values are situated on straight lines extrapolated from the last calculated values. The 
asymptote is more quickly reached for the range than for the extremes. 

The distribution (13) may be used to check an observed distribution of ranges, provided 
that the initial distribution is known to be symmetrical, unlimited and of the exponential 
type, and that the sample size is large enough. 

We do not need to know the sample size n. Furthermore, the knowledge of the analytical 
form of the initial distribution and its parameters is not required, since the parameters of the 
distribution of the range may be estimated from the cbservations themselves. These pro- 


perties mark the essential differences between the author’s theory and the method developed 
by Elfving. 


3. RELATION BETWEEN THE TWO SOLUTIONS 


According to Elfving, the distribution of the probability integral transformation of the 
range is given by equations (3), (4) and (5), whereas the author’s method leads to a dis- 
tribution of the range proper, given by equations (8), (12) and (13). 

The question arises of how these two results are related. To this end we establish the 
asymptotic nature of Elfving’s variates £ and 7, for a symmetrical unlimited distribution of 








144 Probability tables for the range 
the exponential type. It has been shown (Gumbel, 1935), that under these conditions the 
probabilities ®(z,) and ®(z,,) of the smallest and of the largest values converge towards 
(x,) = dexter, 1 —(a,) = Le-aten-m, (17) 
n n 
Then the variate £ becomes, from (1), 
= 4e—%@n—21—2u) 
or, from (5) and (8), & = 2e-tF, (18) 


which is an exponential function of the reduced range R. This relation is more general 
than (4). Since 

















































































































df =—e4RdR 
Scale of 8: 
0 02 0-4 06 08 1-0 12 
8 LS ee ee ee RL 
QT TTP ET TTT TTT TTT TTT 
}+-— 4 dy 1 
PENRO us 
4E 10 SQa00 ~ 7 
= 00 o™ <ul 
y 20 NE o00 »~ ‘a 
- "to = 
38 
as “N00 a 5 = 
a £ 300 = 
i OY 
= 42 So ~ “ 
a at oo al 
- Bee 
be — 
46 
= Re 
a > | 
° 
—— Beta coefficients as functions of sample size: es oo 
50 a (0) @=ee@ for normal ranges 
Sig (6) e———@ for normal extremes — 
i 
a 4 
- 
sae - 
Hitibibbbbbbbbbb bbb bb nbs 
01 03 os 07 0-9 1 13 


Fig. 1 


the distribution y(R) of the reduced range R becomes, from (3) and (18), 

¥(R) = 2e4#® K,(2e48) e-4F, 
which is again formula (12). Thus Elfving’s approach yields the same result as the direct 
method developed simultaneously (Gumbel, 19474). 


It is worth while to study also the second variate 7. From (1) and (17) we obtain under 
the same conditions 9 = fa(x,—u+a, +u). 
This may be written n = fav, 


(19) 
where 


v= 2,42, (20) 

































































E. J. GuMBEL 145 
she Table 1. The asymptotic probability integral, ‘Y(R), and the asymptotic 
distribution function, y(R), of the reduced range, R 
17) ¥(R) ¥(R) R ¥(R) y(R) R ¥(R) W(R) . 
1-5 | 0-62545 | 0-20346 6-5 | 0-9904510 | 0-0080533 
1-6 | 0-64547 | 0-19693 6-6 | 0-9912243 | 0-0074218 
1:7 | 0-66482 | 0-19016 6-7 | 0-9919369 | 0-0068375 
18) 0-00020396 | 0-00096273 || 1-8 | 0-68349 | 0-18321 6-8 | 0-9925933 | 0-0062974 
ral 0-00032305 | 0-0014471 || 1-9 | 0-70146 | 0-17613 6-9 | 0-9931977 | 0-0057982 
0-00049980 | 0-0021244 || 2-0 | 0-71872 | 0-16898 7-0 | 0-9937542 | 0-0053370 
0-00075618 | 0-0030496 || 2-1 | 0-73526 | 0-16180 7-1 | 0-9942663 | 0-0049111 
0-0011201 | 0-0042852 || 2-2 | 0-75108 | 0-15464 7-2 | 09947375 | 0-0045180 
0-0016259 | 0-0059003 || 2-3 | 0-76619 | 0-14753 7-3 | 0-9951709 | 0-0041553 
0-0023152 | 0-0079687 || 2-4 | 0-78059 | 0-14051 7-4 | 0-9955695 | 0-0038207 
0-0032372 | 0-010566 2-5 | 0-79429 | 0-13360 7-5 | 0-9959359 | 0-0035122 
0-0044486 | 0-013767 2-6 | 0-80731 | 0-12683 7-6 | 0-9962727 | 0-0032278 
0-0060132 | 0-017642 2-7 | 0-81966 | 0-12023 7-7 | 0-9965822 | 0-0029658 
0-0080016 | 0-022253 2-8 | 0-83136 | 011381 7-8 | 0-9968666 | 0-0027244 
0-010490 | 0-027649 2-9 | 0-84243 | 0-10759 7-9 | 0-9971277 | 0-0025021 
0013559 | 0-033864 3-0 | 0-85289 | 0-10157 8-0 | 0-9973676 | 0-0022974 
0-017291 | 0-040915 3:1 | 086275 | 0-095767 8-1 | 0-9975878 | 0-0021091 
0-021769 | 0-048797 3-2 | 0-87205 | 0-090190 8-2 | 0-9977899 | 0-0019358 
0-027077 | 0-057483 3-3 | 088080 | 0-084840 8-3 | 0-9979754 | 0-0017764 
0-033291 | 0-066924 3-4 | 0-88902 | 0-079720 8-4 | 0-9981456 | 0-0016298 
0-040484 | 0-077049 3-5 | 089675 | 0-074830 8-5 | 0-9983017 | 0-0014950 
0-048721 | 0-087768 3-6 | 0-903998 | 0-070169 8-6 | 0-9984450 | 0-0013711 
0-058054 | 0-098971 3:7 | 0-910792 | 0-065735 8-7 | 0-9985763 | 0-0012573 
0-068527 | 0-11053 3-8 | 0-917153 | 0-061524 8-8 | 0-9986967 | 0-0011527 
0-080168 | 0-12232 3-9 | 0-923104 | 0-057532 8-9 | 0-9988071 | 0-0010566 
0-092904 | 0-13419 4-0 | 0-928666 | 0-053753 9-0 | 0-9989083 | 0-00096837 
0-10700 0-14599 4-1 | 0-933861 | 0-050181 9-1 | 0-99900102 | 0-00088737 
0-12218 0-15758 4-2 | 0-938709 | 0-046810 9-2 | 0-99908599 | 0-00081302 
013851 0-16881 4-3 | 0-943230 | 0-043632 9-3 | 0-99916383 | 0-00074479 
0-15593 0-17956 4-4 | 0-947442 | 0-040641 9-4 | 0-99923513 | 0-00068218 
017440 0-18969 4-5 | 0-951364 | 0-037828 9-5 | 0-99930044 | 0-00062474 
019384 0-19909 4-6 | 0-955013 | 0-035186 9-6 | 0-99936024 | 0-00057206 
—0-3 | 0-21419 0-20768 4-7 | 0-958406 | 0-032708 9-7 | 0-99941499 | 0-00052374 
—0-2 | 0-23535 0-21536 4-8 | 0-961560 | 0-030385 9-8 | 0-99946512 | 0-00047944 
~0-1 | 0-25723 0-22208 4-9 | 0-964488 | 0-028211 9-9 | 0-99951100 | 0-00043883 
0 | 0-27973 0-22779 5-0 | 0-967207 | 0-026177 10-0 | 0-99955300 | 0-00040161 
0-1 | 030275 0-23246 5-1 | 0-969728 | 0-024277 10-1 | 0-99959143 | 0-00036750 
0-2 | 0-32619 0-23608 5-2 | 0-972066 | 0-022502 10-2 | 0-99962659 | 0-00033624 
, 0-3 | 0-34993 0-23866 5:3 | 0-974233 | 0-020846 10-3 | 0-99965877 | 0-00030761 
0-4 | 0-37389 0-24021 5-4 | 0-976239 | 0-019303 10-4 | 0-99968820 | 0-00028138 
0-5 | 0-39794 0-24075 55 | 0-978097 | 0-017865 10-5 | 0-99971512 | 0-00025735 
- 0-6 | 0-42201 0-24034 56 | 0-979816 | 0016527 10-6 | 0-99973973 | 0-00023535 
0-7 | 0-44598 0-23902 5-7 | 0-981405 | 0-015283 
0-8 | 0-46978 0-23685 5:8 | 0-982875 | 0-014126 
ler | 0-9 | 049333 0-23389 5-9 | 0-984233 | 0-013051 
1-0 | 0-51654 0-23021 6-0 | 0-985488 | 0019053 
1-1 | 0-53935 0-22588 6-1 | 0-986646 | 0-011127 
19) 1-2 | 0-56169 0-22097 6-2 | 0987715 | 0-010269 
1:3 | 0-58352 0-21554 6-3 | 0-988702 | 0-0094729 
20) 1-4 | 0-60479 0-20969 6-4 | 0-989612 | 0-0087358 






































Biometrika 36 10 











146 Probability tables for the range 
is the mid-range. Since é is positive, the distribution f,(7) of the variate 7 becomes, from (2), | 
if we put Ecoshy =t 
, 3 ” + t 
after integration over £ fs(9) = Doosh? “I, te dt. 


This distribution may be written 


is 


2 


2e-20 





(e?+e-7)® (1 +e-29)?” 
Consequently the distribution f,(v) of the mid-range 
v = 2y/a 


f,(v) = 


~ 
( 1+ e-e)2 ° 


ea 


a result which is already known (Gumbel, 1944). The asymptotic distribution of the mid- 


range for a symmetrical initial distribution is symmetrical but not normal. This may be of 


interest since it refutes the widespread opinion that all measures of central tendency converge 
towards normality. Tables for the distribution (21) are easily constructed. 


4. TABLES FOR THE ASYMPTOTIC RANGE 


The numerical values of the asymptotic distribution (10) and its integral, the probability 
‘Y(R) of the reduced range, are not easy to obtain. The Calculation and Ballistics Department 
of the Naval Proving Ground (Dahlgren, Va.) has calculated tables for both functions by 


stepwise integration of the differential equation (11), using the special relay calculator of 
the International Bus.aess Machine Corporation. The calcylations started from boundary 


values obtained from the British Association Tables (1937). The reduced range chosen varied 
from R = — 3-22 to R = 10-60 at intervals of 0-01. The results given to 8 and 9 decimal places 
may be in error by two or three units in the last place. These figures* are given in a condensed 
form in Tables 1 and 2. It is hoped that they will be sufficient for all practical purposes. 


Table 2. Percentage levels of the reduced range 





























1 2 3 4 5 6 
aad 
Probability Reduced range 
Number of Probability that 
samples R,<R<Ry 

Small Large Negative Positive ¥(Ry)—'¥(R,) 
¥(R,) ¥(Ry) R, Ry 

0-0005 0-9995 — 2-999 9-875 2000 0-999 
0-0010 0-9990 — 2-829 9-099 1000 0-998 
0-0020 0-9980 — 2-642 8-314 500 0-996 
0-0025 0-9975 — 2-578 8-059 400 0-995 
0-0050 0-9950 — 2-362 7-260 200 0-990 
0-0100 0-9900 — 2-118 6-445 100 0-980 
0-0200 0-9800 — 1-837 5-611 50 0-960 
0-0250 0-9750 — 1-737 5-337 40 0-950 
0-0500 0-9500 — 1-386 4-464 20 0-900 
0-1000 0-9000 — 0-949 3-544 10 0-800 
0-2000 0-8000 — 0-369 2-543 5 0-600 
0-2500 0-7500 — 0-133 2-193 4 0-500 











* The author wishes to express his sincere appreciation for permission to reproduce these tables. 


| 








Colun 
W(R) 

Th 
in Ta 
certa 
also | 


wher 
sam] 
3 an 
(col. 
1 a 





(2), 








E. J. GuMBEL 147 


Column 1 of Table 1 gives the reduced range R, columns 2 and 3 give the probability integral 
’(R) and the distribution function y(R), respectively. 

The percentage levels obtained from the original tables (not reproduced here) are given 
in Table 2, in which the reduced ranges R (cols. 3 and 4) are written down as functions of 
certain values of the probabilities ‘Y(R), (cols. 1 and 2). Col. 5 is the inverse of col. 1. It may 
also be interpreted as the solution of 


W(Ry) = 1-1/N, (22) 


where Ry, defined in analogy to (6), stands for the expected largest reduced range in N 
samples. Col. 6 is obtained from the differences of cols. 2 and 1. The reduced ranges (cols. 
3 and 4) of Table 2 are traced against the probabilities (cols. 1 and 2) and the numbers V 
(col. 5) in Fig. 2. For the sake of completeness, the curves are extended from ‘¥ = 0-0003 and 
1—‘¥ = 0-0003 up to the median by use of Table 1. 
Upper probability level 

99-97% 999 95 «(99 9% 90 
ae ee | re 

3000'2000 1090 500 300200 100 50 30 20 10 
Number of ranges 


50% 
ti, 
2 





TTT, 


80 
| 
t 
5 






-_ 
Ss 
vrrryr 

| uli 


oo 


oa 
~ 


TOOT TIVTTTTTIT TTT Trey ee peep ee peel 


uw 


Reduced range R 


Probability levels for the reduced range 


Reduced range R 


rn 


TTTT 











sessfsslescsfesrslrorfersstessabrrrobecrsterrebesrrteres wl 
a 





=a Lil 


and Lei 2 Akh ay 





002% 005 01 1 10 50% 
Lower probability level 
Fig. 2 


The mode &, the median R, and the mean R are, respectively, 


R = 0-506366440; R = 0-928597642; R = 1-154431330. 


CoNCLUSIONS 


The asymptotic distributions of the ranges and mid-ranges for an unlimited symmetrical 
initial distribution of the exponential type are obtained from the convolution of the 
asymptotic distributions of the extremes. Elfving’s approach for the normal range leads 
to the same results as this direct method which does not require the knowledge of the sample 
size, nor of the analytical form of the initial distribution, nor of the numerical values of its 
parameters. 








148 Probability tables for the range 


REFERENCES 


British Association (1937). Mathematical Tables, 6, Bessel Functions, Part 1: Functions of order zero 
and unity. Cambridge University Press. 

Ervine, G. (1947). The asymptotical distribution of range in samples from a normal population. 
Biometrika, 34, 111. 

FisHer, R. A. & Tippett, L. H. C. (1928). Limiting forms of the frequency distribution of the largest 
or smallest member of a sample. Proc. Camb. Phil. Soc. 24, 180. 

GumsEt, E. J. (1935). Les valeurs extrémes des distributions statistiques. Ann. Inst. Henri Poincaré, 4, 
115. 

Gumset, E. J. (1944). Ranges and mid-ranges. Ann. Math. Statist. 15, 414. 

GumBEL, E. J. (1946). On the independence of the extremes in a sample. Ann. Math. Statist. 17, 78. 

GumBEL, E. J. (1947a). The asymptotic distribution of the range. Bull. Amer. Math. Soc. 53, 68. 

GuBEL, E. J. (19476). The distribution of the range. Ann. Math. Statist. 18, 384. 


Pearson, E. 8. (1926). A further note on the distribution of range in samples taken from a normal 
population. Biometrika, 18, 173. 


we 


—~s = 








78. 


nal 


—————— 





[ 149 ] 


SYSTEMS OF FREQUENCY CURVES GENERATED 
BY METHODS OF TRANSLATION 


By N. L. JOHNSON 


1. INTRODUCTION 
1-1. Preliminary remarks 


This paper is concerned with the discussion of some of the uses which may be made of trans- 
formations of variables such that the transformed variables may be considered to have a 
normal distribution. The concept of such transformations was put forward by Edgeworth 
(1898) and termed by him the Method of Translation. Edgeworth considered, in fact, only 
transformations which could be represented by polynomials, as did Kapteyn (1903). Later, 
however, Kapteyn & Van Uven (1916), Wicksell (1917) and Rietz (1922) extended the 
method to more general kinds of transformation. As we shall see later, the particular case of 
the logarithmic transformation, which is given some prominence in each of the last two 
references, had been anticipated by other authors who had not, however, considered the 
transformation as more than a special device applicable to particular cases. 
1-2. Historical development 

It is of interest to consider the reasons why the need for such transformations should have 
arisen. There is no doubt that in the earlier phases of their development the primary object 
was that of graduating observed frequency distributions. The normal distribution had played 
a dominant role in both theoretical and applied statistics since the time of Laplace. It was, 
however, apparent that the normal curve could not provide an adequate representation of 
many cf the distributions encountered in statistical practice. Towards the end of the nine- 
teenth century attempts were made to construct systems of frequency curves which should 
be capable of representing a wider variety of distributions than those for which a normal 
curve would suffice. It may be noted that the most obvious departure from normality was 
that which is described as skewness, and that much of the work at this time was described 
as the construction of systems of ‘skew frequency curves’. The most successful of the 
systems then proposed have been those of K. Pearson (1895) and Charlier (1905). The work of 
Edgeworth and others, referred to in § 1-1, constitutes a third line of approach which, though 
not so widely used as those of Pearson and Charlier, has certain advantages of its own. In 
view of the important position vccupied by the normal curve, it was, of course, natural to 
consider the possibility of relating observed distributions to the standard form. The fact 
that functions associated with the normal curve were well tabulated must also have been 
a strong contributory factor. An important reason for the lack of general acceptance of the 
method of translation is the fact that it became apparent that compared with the Pearson 
system the curves proposed covered only a very limited variety of shapes. A similar criticism 
might, of course, be directed at the Charlier system, though the latter system possesses 
advantages in respect of the aid which its analytic form offers to theoretical investigations. 

The main purpose of this paper is to propose certain systems of curves derived by the 
method of translation, which, it is hoped, retain most of the advantages while eliminating 
some of the drawbacks of the systems first based on this method. 








150 Frequency curves generated by methods of translation 


1-3. Transformation to normality 


Subsequently to the construction of the systems of curves described in § 1-2, the normal 
distribution has gained added importance as a result of developments in statistical theory. 
In particular, the theory of significance tests and the associated probability distributions 
have been worked out much more thoroughly for normal populations than for other cases. 
Originally this may have been a consequence of the theoretical importance of the normal 
distribution, based on Laplace’s theorem and the central limit theorems, but a factor of 
considerable importance is the simplicity of results based on normal populations. ‘Normal 
theory’, as it may be termed, is so much simpler than theory based on any general system of 
curves that it is of great importance to be able to use it if possible. To this end, two lines of 
inquiry have been put forward. E. 8S. Pearson (1931) and R. C. Geary (1947), inter alia, have 
considered the problem of how far normal theory may be invalidated by various kinds of 
departure from normality in the original distributions. The other approach is in effect an 
application of the method of translation. A function of the observed variable is sought which 
shall be, with sufficient approximation, a normal variable. Normal theory, with its simplicity 
and convenience, is then applied to the transformed variables. Curtiss (1943) gives a good 
critical summary of many of these methods. It may be noted that the interest, in these 
applications, lies in the significance tests to be applied and not in the creation of systems of 
frequency curves. 

A further application of the method of translatiori is found in the approximate normaliza- 
tion of certain test criteria. In this case it is implicitly assumed that the original distribution 
is normal, and the method of translation is used to simplify certain parts of ‘normal theory’. 
Examples are the Wilson-Hilferty (1931) transformation of x, and the transformations 
proposed by Hotelling & Frankel (1938) and by Cornish & Fisher (1937). 


1-4. General theoretical background 


Pretorius (1930), in the course of a long paper dealing with non-normal distributions, 
remarks: ‘The superiority of one frequency function over another depends rather on the 
success with which that function can be applied to graduate data than on the manner in 
which it originated.’ This point of view has much to recommend it and, if accepted, absolves 
us from the necessity of providing a plausible probability theory basis for any proposed 
system of frequency curves. On the other hand, it must be remembered that the normal 
curve was first reached from probability theory rather than from the graduation of data. 
While, therefore, from a utilitarian point of view a probability theory basis is unnecessary, 
it is useful to keep the theory in mind when constructing new systems. For example, Pearson’s 
fundamental differential equation was based on certain considerations of probability, 
though it was applied in cases where these considerations could hardly be presumed valid. 
Rather similarly the method of translation can be related to probability theory in a general 
and somewhat tentative manner. The argument, as described below, is due to Kapteyn 
(1903) and Wicksell (1917). 

The normal distribution can be considered as arising from the summation of a large 
number of small independent effects which have occurred in a specified order. If it be now 
supposed that the magnitude of an effect be proportional to some function of the value of 
the variable before the addition of the effect, it can be shown that a certain function of the 
final variables should be normally distributed. 











N. L. JoHNson 151 


Suppose 2,, Z2, ... to be independent random variables, each capable of taking only a small 
range of values near zero. The first sentence of the preceding paragraph can be interpreted 


as meaning that X,, = 2, +%_t...+2, (1) 
will be approximately normal if n is large. The second part of the paragraph means that if 
Y, = X%1+2%_G(Y,)+...+2, GY,_,), (2) 


where G is some function, then it is possible to determine a function f(Y,,) which is approxi- 
mately normally distributed. This would be the case if 
S(Yn) = Xp + Xot...+%y. 
Now if this be so, S(Yn) —f(Yn-1) = 2n- (3) 
From (2), Y,—Y,-1 = 7, HY,-1)- (4) 
{©.)-fT.1)_ 1 
Yi-Yea = Ya) 

Since z,, is supposed small, it follows that 

f'(Y)+V/G(Y). (5) 
Van Uven, in an Appendix to Kapteyn & van Uven (1916), and Baker (1934) have pointed 
out that it is always possible in theory to transform any continuous distribution into a normal 
distribution. Van Uven gives a graphic method of doing so, while Baker proposes an approxi- 
mation based on the method of moments. In both cases, however, practical difficulties are 
considerable. 

The parallelism of equation (5) and the equation for a function which shall ‘equalize 
variances’ (when the standard deviation is proportional to the function G of the expected 
value) is notable, and, in fact, equalization of variance and approximate normalization often 
go together. Curtiss (1943) gives a full discussion of these two aspects of certain transforma- 
tions used in the analysis of variance. 

Recently, in connexion with problems concerning the distribution of particle sizes, 
Kolmogoroff (1941), Halmos (1944) and Epstein (1947) have developed another theoretical 
basis leading, under certain conditions, to the most common form of the distributions which 
we shall consider (see § 3-1 below). 





Hence 


1-5. Order of discussion 

In this paper we shall not consider in any further detail theoretical arguments for the use 
of transformations to normality. We shall be concerned, rather, with the study of the pro- 
perties of distributions for which simple transformations to normality are possible. 

In §§ 2-1—2-4 the problem will be considered in a general manner. Certain properties which 
are valid for wide classes of transformation will be described and a basis will be developed 
for the discussion of any special system. Three such special systems will be put forward, 
and their properties discussed, in §§ 3-1-3-6; bivariate distributions based on these systems 
will be considered in a later paper. 

2. GENERAL THEORY 
2-1. Translation as a method of generating systems of frequency curves 
Any curve of the Pearson system of frequency curves is a solution of the differential equation 
Idy _ a+x 


ydx Cg + 0,2 +0, 2°" 








152 Frequency curves generated by methods of translation 


and is defined by the values of the parameters a, Cy, c, and c, in that equation. Somewhat 
similarly, a curve in the Charlier A system is defined by the values of coefficients in the well- 
known expansion of derivatives of the normal function. 

If we write a transformation of a variable x to normality in the formal manner 


z= f(z), 
where z is a unit normal variable, we have, clearly, defined a multiply infinite system of 
frequency curves, corresponding to the possible functions f(z) which might be chosen. In 
order to obtain a system of curves analogous to the Pearson or Charlier systems, f(x) must be 
specialized, ,referably in a simple form, and made to depend on a certain number of para- 
meters. The values of these parameters will then determine which curve of the system 
represents the distribution of z. 


It is convenient to introduce four parameters (as in the case of the Pearson system) 
and to write a—€ 


- = v+af(=")- (6) 


Here f is, prsferably, a function of simple form, depending on no variable parameters. 
f{(x—&)/A} should also be a monotonic function of x. Without loss of generality it will be 
supposed that f{(x—£)/A} is a non-decreasing function of x and that é and A are positive. 


From quite general considerations, it is possible to appreciate the roles played by certain 
of the parameters. If we write 


y = (x—£)/A, 
then z= y+df(y), (7) 
whence Ply) = of’ (y) p(z) le=y+apy) (8-1) 
é , _ . 
= vam (y)exp{— s[y + of(y)}*}- (8-2) 


Equation (8-1) is, of course, of general validity and does not depend on the definition of z 
as @ unit normal variable. Since x = £ + Ay, it follows that the distribution of x will be of the 
same shape as that of y, which is given, in general, by (8-1). The standard deviation of z will 
be A times that of y, while changes in will affect only the expected value (or other central 
measure) of the distribution of x. 

It follows that the parameters y and é determine the shape of the distribution of x, that 
A isascale factor and £ a location factor. It follows also that attention should be concentrated 
on the relation between the values of y and é and the distribution of x, since the parameters 
£ and A affect the distribution only in a simple manner. It will therefore be convenient to 
take as our standard form of transformation 


z= y+Hly), (7 bis) 


rather than the more complicated expression (6), and to investigate the relation between 
y, 6 and the shape of the distribution of y. 


2-2. Requirements of a translation system 


The system of frequency curves obtained depends on the function f(y) which is chosen. 
For practical convenience this function should possess the following properties: 
(1) It should be a monotonic function of y. 


(2) Apart from being simple in form it should be easy to calculate. Preferably, tables of 
the function should be in existence. 





should 
when t 
being 2 

(4) ° 


of mos 


_ a tetsig., daneity 


ET 
- 








N. L. JoHNSON 153 : 


lat | (3) The range of values of f(y) corresponding to the actual range of possible values of y 
ll- should be from —o to +00. Although good approximation may sometimes be obtained even 
| when this requirement is ignored, it is highly desirable that it should be satisfied, since z, 
being a normal variable, is supposed to vary from —0o to +00. 
(4) The resulting system of distributions of y (and so of x) should include distributions 
of most, if not all, of the kinds encountered in collected data. 






























of 
In | A 
be 
a- N 
‘m > % 
ze x 
3% "i 
» 2§ 
2% 
6) a 
a 
S. 
be 
i 
. i 
in ! 
: aad 
Axis of y , 
7) ' 
' 
1) ; 
' 
2) : 
' 
2 
e ; 
ll ' 
il 0 Scale of x + r 
t ' Probability density 
d function of y (or x) 
8 . 
o Median 
Fig. 1 
) ' 2-3. General properties of translation systems 
n In this section we shall study the transformation z = y+ 4f(y), remembering that z is 
related to y by the linear equation x = £+Ay. We shall suppose in the first place that z is 
a standardized variable with a symmetrical distribution. The general properties of the 
relationship can be most easily appreciated with the help of Fig. 1.* This diagram actually 
l represents the case where z is a standardized normal variable and the function f has the form 
log {y/(1 —y)}, but it illustrates the general properties of the transformation. 
Relatively to the base-line A BC, which is parallel to and at a distance y from the axis of y, 
f the dotted curve has been plotted with ordinates f(y) and abscissae y. For the solid-line 


* I am indebted to Prof. E. S. Pearson for suggesting the use of this diagram. 








154 Frequency curves generated by methods of translation 


curve the ordinates, measured from ABC, have been multiplied by 6. As a result, when 


referred to the axis of y and z, the solid-line curve represents the functional relation between 
y and z. 

The effect of the distortion of the z-scale, due to this relationship, on the distribution of y 
(or of 2) is also illustrated. The shaded columns, equal in area, under the two distribution 
curves, represent the probabilities of z and y (or 2) falling in corresponding small intervals 
éz and dy (or 6x). Clearly where f’(y) has a high value, the contraction on the y-scale due to 
the transformation is greater than where f’(y) is smaller. The values of y and é affect the 
distribution of y in so far as they determine over what parts of the total range of y these 
augmentations and diminutions of probability density shall occur, and to which parts of 
the distribution of z they shall correspond. 

As é is increased, it is seen that the range within which observations are likely to be found 
(e.g. corresponding to —3<z<3) will correspond to a smaller and smaller length of the 
dotted curve representing f(y), which in the limit may be regarded as linear. Thus if y, be 


defined by 0 = y+ Fly); (9) 


it is seen that y, will be the median of the y distribution and, further, if é be sufficiently large, 
we shall have to a close degree of approximation 


z=48(y—Yo)f (Yo) (10) 
for the bulk of the distribution of y. (10) may be written 
Y= Yo +2/4f" (Yo). (11) 


Hence if d is large, y will have a distribution of approximately the same shape as z. We also 
note from (11) that an increase in d may be expected to decrease the standard deviation of y. 

We shall now restrict ourselves to a special class of transformation functions. A trans- 
formation will be called symmetrical if there is a unique number 9 such that 


fin+y') = —fn-y’) 


for all y’.* It follows that f(y) = 0. For symmetrical transformations, therefore, y, = if 
y = 0; further, if y = 0, the distribution of y is symmetrical about 9 since the changes in 
probability density are symmetrical about the centre of the distribution of z. If y is not zero, 
the distribution of y is skew. The parameter y is thus particularly associated with skewness. 
In general, however, ¢ also affects skewness, and y affects the kurtosis. As suggested by 
Fig. 1, other relations may be traced between (a) the shape of the distribution of y, and (b) the 
form of f(y) and the magnitude and sign of y. 


2-4. Fitting and errors in fitting 
The methods of fitting curves in general use are 
(i) The method of percentile points. 
(ii) The method of moments. 
(iii) The method of maximum likelihood. 
Method (i) is peculiarly suitable for fitting curves of a translation system. The percentile 


points of the distribution of y can easily be expressed in terms of the corresponding points 
of the distribution of z, and these latter will usually be tabled. 


* The transformation shown in Fig. 1, f(y) = log {y/(1 — y)}, is symmetrical about 9 = 4. 








ile 


eee a 


N. L. JoHnson 155 


Should the moments of y be of fairly simple form, method (ii) may be used. Ifall four 
parameters y, 3, £ and A are to be estimated, y and é are first determined from f(x) and £,(x); 
then £ and A are determined so that agreement in mean and standard deviation is obtained. 

Method (iii) is rather difficult to apply to translation systems. However, a method of 
successive approximation can be worked out which, though tedious, is straightforward and 
applicable to all cases of transformation to normality. 

Although the process of fitting reduces to the estimation of y, 8, £ and A, the accuracy of 
these estimates is usually of les§ intrinsic interest than the accuracy of probabilities (or 
expected frequencies) calculated from the fitted curve. For a given form of distribution of z, 
it is a simple matter to investigate the variation in computed probabilities associated with 
variation in the values assigned to the parameters, assuming that the correct form of function 
f{(a—£)/A} has been chosen. A brief study of the effect of an incorrect choice of this function 
in certain special cases will be given in § 3-6. 


3. SPECIAL SYSTEMS 


3-1. The log-normal system 
The most common transformation of type (6) is that termed by Gaddum (1945) the log- 
normal transformation. 


xr—f& 
If 2 = 7+Slog( P| (12-1) 





or z= yt+élogy, (12-2) 


z being a unit normal variable, the distribution of z (or of y) is said to be log-normal. 

The transformation was proposed by Galton (1879), anticipating the form of argument 
used by Kapteyn, and some properties of the distribution were obtained by McAlister (1879). 
Fechner (1897) also used the transformation in a special application, but the idea was not 
further pursued by these authors. Kapteyn & Van Uven (1916) gave a graphical method of 
fitting the distribution and investigated its shape. Wicksell (1917) dealt rather more fully 
with the subject. He pointed out that the log-normal transformation is obtained by putting 
G(Y) = Y in (5); that is to say, by assuming random increments proportional to the variable 
to which they apply. Wicksell also obtained the moments of the distribution of y. We have 


gs 
“ ae eine -y 8 e ~$22 dz 
#;(y) (2m) ) —« 
— ptr8—2rys-1 (13) 
li follows that 8, =(w—1)(w +2)? (VA, >9),) 


By = w* + 2w* + 3w? — 3, j (14) 
where w = e® *. The (f,, £,) points for log-normal distributions therefore lie on a curve defined 
by the parametric equations (14). This curve is shown in Fig. 2. This restriction of the locus 
of (8,, 82) is to be expected since (12-1) can be written 


z= (y—dlogA) +d log (x—8&), 


so that there are only three independent parameters and, without any loss of generality, 
(12-1) may be rewritten 
z= y+délog(a—&). - (15) 











156 Frequency curves generated by methods of translation 


Wicksell also proposed a method of fitting log-normal distributions based on the observed 
moments m;, m, and m, of the distribution of z. The positive root of the equation 


#+3t— Jb, = 0 (16) 


(where ./b, = m,/m§) is found. , in (15), is then estimated by means of the formula 
£ = m}—./m,/t. Estimates of y and é are then obtained quite straightforwardly from (13). 
Yuan (1933) gave tables to facilitate the solution of (16). Quensel (1945) has given expressions 
for the standard deviations of estimates obtained by this method. Finney (1941) pointed out 
that the mean and variance of the transformed variable log (2 — £) should be used if efficient 
estimates of y and é are to be obtained, and obtained expressions for such estimates. These 
could not of course be applied directly if £ is unknown. 


If y is log-normal ply) = or eKy+slogv® §(O<y). (17) 


Yuan pointed out that this distribution has infinitely high contact at either end of its range 
of variation, since 

oe SS ree 
for all values of n. . 

The log-normal system has proved useful in a number of applications. We may mention 
its use in dosage-mortality problems (e.g. Gaddum, 1945) in the graduation of economic data 
(Gibrat, 1931; Frechet, 1945) and in agriculture (Cochran, 1938). Williams (1937, 1940) has 
applied the system to a varied collection of problems. 


3-2. Extension of the logarithmic type of transformation 

Despite its successful application in a number of cases, the log-normal system is restricted 
in flexibility, just as is Pearson’s Type III distribution, because the associated (f,, £,) 
point must lie on the curve defined by equation (14). It seems reasonable to suppose that 
useful extensions of the system might be obtained by using different functions f(y) in (7) 
(or f{(a—&)/A} in (6)). We shall now consider the construction of such new systems, and will 
start by laying down certain properties which it appears desirable that they should possess. 

(i) In order to avoid a restricted locus of variation for (£,,£,), the function f should be 
such that in equation (6) there are four truly independent parameters. 

(ii) The new systems should fit in naturally with the log-normal system which could be 
regarded as a transition form, lying between two systems of distributions, one with a range 
of variation bounded at both extremities, the other unbounded at either extremity. By 
analogy with the Pearson system, it is to be expected that in the (f,, 2.) plane the system 
with a bounded range of variation will cover the region between the log-normal line, and the 
limiting line £,—£,—1 = 0; while the other system will cover the remainder of the (f,, 2) 
plane. 

These regions are indicated in Fig. 2, wherein are also introduced the symbols S, for 
‘log-normal system’, S, for ‘bounded system’ and S,, for ‘unbounded system’, which will 
be used in the remainder of this paper. It may be noted that the scheme of curves sketched 
above is not strictly analogous to the Pearson system, as in the latter there is a region in the 
(8,, 82) plane corresponding to range of variation bounded at one end only (Type VI). 

(iii) Finally, in he choice of the function f(y) the considerations detailed in § 2-2 must be 
kept in mind, and he requirements therein scheduled satisfied as far as is possible. 














Puttin 


nge 


the 


Bz) 


for 
vill 
ed 
he 


be 








N. L. JoHnson 157 
3-3. Choice of new transformation functions 
Consider the log-normal variable x, defined by 
z=y+élog(x—£) (&<z2). 
Putting y = 1—£/xz, we have 





z = (y+8log£) + dlog = (0<y<}). (18) 
Axis of #; 
0 1 2 3 4 
Tt f Ls ' 





Impossible area 


Axis of ha 











Fig. 2 


The log-normal line is marked S;,, 


————— — Pearson Type III. 
oweeceeeeeceas Pearson Type V. 
a — Boundary of bimodal curves of system S3. 


A transformation of this type is also obtained from the general formula (7) by putting 
fly) = log {y/(1—y)}. Putting y = (w—£)/A, we have as a particular case of (6) 


Z= y+ Blog 2-55 (E<a<€+A). 
The system of curves generated by (18) or (19), z being a unit normal variable, will be our 


system S,. 


(19) 











158 Frequency curves generated by methods of translation 


We may note that the proposed function f(y) satisfies the conditions laid down in § 2-2, 
that it should be simple and calculable without undue difficulty. In fact 


fy) = logy, = 2tanh- (2y— 1). (20) 


| 
| 


| 


Tables of inverse hyperbolic tangents (Milne-Thompson & Comrie, 1931) may therefore | 


be used to evaluate f(y). We further note that f(y) has the desirable property that it increases 
from —0co to +00 as y increases from 0 to 1. 

The transformation (19) was suggested by Wicksell (1917) as possibly being worthy of 
study. Bartlett (1937) has stated that (19) proves useful in certain analysis of variance 


problems. It may also be noted that Fisher’s z’ transformation for the correlation coefficient 
is of form (19) with 


E=-1, A=2, y=-4J(n- 3)log+* rs 6=}./(n—3), 


where = sample size, p = population correlation coefficient and x = r, the sample corre- 
lation. 


The construction of our system S,, is rather more arbitrary, though suggested by analogy 
with S; and S,. The transformation by which we shall define S,, is 


z= y+ésinh (754) = y+alog[*>*+,/] A) + || (21) 


or z=y+ésinh"y = y+élog[y+./(y*+ 1). (22) 

Milne-Thompson & Comrie’s tables of inverse hyperbolic sines may be used to evaluate 
f(y) in this case. As required in § 2-2, f(y) increases from —0o to +00 as y increases from —0o 
to +00. Beall (1942) and Bartlett (1947) have suggested the use of the function sinh-!/y 
or sinh-' ,/(y + 4), especially with reference to negative binomial variables. 








3-4. The system S, 
From (19) we have immediately 





é§ 1 l y \? 
eying Nc sie tad 0 
Hence ply)= ‘Ge y—¥8*106 y—y8-+1) (| — y) 487108 (1—y)+78—1 g—8? log y log(1—y) 
so that one y-"p(y) = 0 = lim (1—y)-" p(y) 
y—>0 y>1 


for any value of n. The distribution curve of y therefore has ‘high contact’ at either end of 
its finite range of variation. 


Inverting (19) we have y = (l+e@-)-1, (24) 


Hence the median value of y is (1 + e”*)-!. The equation to be satisfied by any modal value 
of y, other than the extremities of the range of variation, is 


oy—1 = 8(y+dlog;). 


Putting y = +(y' +1), 





, l+y’ 
—yé = lo ;. 25 
¥-7 sity (25) 








The nu 


§ 2-2, 


(20) 


efore 
2ases 


Ly of 


ance 
sient 


rre- 
logy 


(21) 


(22) 


late 
— 00 


vy 


23) 


5) 


— 





N. L. JoHnNson 159 


The number of intersections of the straight line u = y’ — yd and the curve 
i+y’ 
= g*log ——2- . 

u=d logy (25-1) 
determines whether the distribution of y is or is not bimodal. If there is only one intersection, 
the distribution is unimodal, if there are three intersections it is bimodal. Supposing, for 
the moment, that y> 0, there is clearly one intersection in the interval —1<y’<0. There 
may be two other intersections. These must be in the interval 0 <y’ <1 if they exist. In the 
limiting case, the straight line in (25-1) will touch the curve at some point in the interval 
0<y’ <1. At this point the slopes of line and curve must be equal, so that 


1 = 26%1-y’2)7, ie. y’ = (1—26%). 
Hence the line will touch the curve at this value of y’ if 
J(1 — 262) — yd = 26? tanh-1,/(1 — 28%), 














i.e. y = d-[Y(1 — 262) — 26? tanh ,/(1 — 282)]. 
It follows that the necessary and sufficient conditions for bimodality (whatever the sign of y) 
are 6<1/J2, |y|<d-J(1— 26?) — 26 tanh-! ,/(1 — 26%). (26) 
Table 1 shows the limiting values of | y | for various values of 6. 
Table 1 

é Maximum |+| é Maximum || 

0-7 0-0027 0-3 2-12 

0-6 0-175 0-2 4-02 

0-5 0-533 0-1 9-37 

0-4 1-12 














Figs. 3 and 4 show the limiting curves y = 0, 6 = 1/,/2 and y = 0-533, d = 0-5. Clearly 
these limiting curves will have a nearly flat horizontal portion with an inflexion at 
y’ = (1-26); ie. at y = $[{1+./(1 — 26?)]. We also note that, if d be fixed, as y increases, 
the ‘permanent’ mode is always below y = }[{1—./(1 — 26?)], the anti-mode, when present, 
is between 4[{1—./(1—26*)] and 4[1+./(1—2é?)] and the secondary mode is above 
[1 +./(1 — 26*)]. Fig. 5 shows a symmetrical bimodal distribution, while Figs. 6, 7 and 8 show 
typical unimodal curves of S;. The boundary above which (f,,8,) points correspond to 
bimodal curves has been shown in Fig. 2, as far as it has been explored. 

The moments of the distribution of y are complicated in form. They are discussed in the 
Appendix, where some numerical values are given. It is also shown in the Appendix that the 
(8,, 82) points of curves of the system S, cover the area between the log-normal line and the 
straight line £,—,—1 = 0. Until sufficiently comprehensive tables are available, it is clear 
that it will not be possible to fit curves of types S; by the method of moments. For practical 
purposes the method of percentiles is the most convenient to use, though in certain special 
cases it will be possible to apply the method of maximum likelihood in quite a simple manner. 

The process of fitting is considerably simplified if one or both end-points of the distribution 
of x are known. We shall deal with the three cases of (a) both, (6) one, (c) neither of the end- 
points known. 








160 Frequency curves generated by methods of translation 


(a) Both end-points known 


In this case, both £ and A are known. Hence, given the value of x, the value of the trans- 
formed variable log (x — £)/( + A-—<) can be obtained directly. Corresponding to observed 
values 2;, %, ...,%, there will be transformed values f,, fo, ..., f,, where 























a,—-€ . 
= log-—.—__ (# = 1.,..., n). 
fi Sry x —2; ( ) 
System Sp System 5, 
y=0; 8=1/J2 y=0; 8=05 
8: =0-00; £2. =1-87 1s 6: =0-00; 2 ~1-63 
1-0 10 
5 05 
0 -. b) 
Fig. 3 Fig. 5 
40 System S, System S, 
y=0-533; 3=05 : y=0; 302 
35 Br =0-42; P,=2-13 fi: =0-00; Pf, =2-63 
a{1 + J(1-287}=085 
3-0F 
25F- 
2-0} 
15 
16 
0o5 
0 
Fig. 4 Fig. 6 


The problem then reduces to that of fitting a normal curve to the observed f,’s. Fitting this 


curve by moments we have 


where 


jomihy gem Suir 


This will give the maximum likelihood estimates for y and 6. 


However, a difficulty arises if the original data are not given in extenso but as a grouped 
distribution. If the original groups (for the variable x) are of equal length, the transformed 
groups (for the variable f) will be of unequal length and there will be groups of infinite length 


(27) 


at either end of the distribution. Moments calculated from such data would require correc- 


tions which would be difficult to ascertain. The method of percentiles is very simple to apply 


in this case. An application of the method is described in Example 1 (pp. 168, 169 below). 




















an 





ans- 
rved 


(27) 


his 





N. L. JoHNSON 161 


































3-5 
sy : 301 System S, 
y=1) = =1; 8=2 
y=13 
2-5 B:=0:53; f2=2-91 25h B: =0-08; B.=2:77 
2-0F 2-0F 
15- 1-5 
1-0F 1-0 
0S} 05 
0 y 1 0 y 1 
Fig. 7 Fig. 8 
System Sy 08 System Sy 
y=0; 8=2 yet; =2 
Bi =0; fr,=45' 40-6 B:=0-76; B.=5S-59 0-6 
40.4 
402 r 
1 i i 
mont 0 ,s =2 =1 o y 
Fig. 9 Fig. 10 
System Sy 
y=t; S=1 


fr =28-8; f.=93-4 





Fig. 11 


(b) One end-point known 

Suppose the lower end-point, i.e. £, to be known. In this case a convenient application 
of the method of percentiles to estimate A, y and @ is as follows: 

From the data we estimate the median 2, and the lower and upper 100P % points 2, 
and %,. Then we have to estimate A, y and d from the equations 














- . 28 
~£4+2-2,'[ (28) 
z = $+ Slog ae | 
pt E+ —2,’ 
l fa ig P 
r ee igh S= 
where Jem).,° 


Biometrika 36 Ir 








162 Frequency curves generated by methods of translation 


From equations (28) we obtain 








(2 — £)? 7" (2, — £) (@,—§) (29) f 
(E+A-—2)? (€+A—2,)(E+A-4,)' 
X (AeA, + A,X%.—2X,2X,) 
whence A= oo X2_X,X, 1? (30) 
where X,;=2;-€ (¢=0,1,2). 


> and FY may then be found from (28). Alternatively, using the value of A obtained from (30), 
y and 6 may be estimated from the observed mean and standard deviation of 
log (x oe fig + A a x), 
as in case (a). 
(c) Neither end-point known 
In this case all four parameters £, A, y and d have to be estimated. The method of percentile 
points in this case requires that estimates be obtained of four values x ,, x, Xp, Xp say, such 


that certain fixed proportions P,, P;,, Po, Pp, respectively, of the distribution of z fall below 
these values. £, 4, } and 6 have then to be found from the equations 


A a.—t 
2 = P+ dlog,*—_, I 
aed is re (31) 
, ; 
where —— e-i"dt = P, 32 
Ten) : " 


These equations may be solved by >cessive approximation. In Example 2 (pp. 169-171 
below) only an approximate solution has been obtained. The values shown could be improved 
by the standard method based on Taylor’s expansion. 


3-5. The system Sy 
From (22) we have immediately 


6 1 
Ply) = Jan) Jye+y [— {y+ dlogl[y+./(y*+ 1)}}?). (33) 


Evidently y"p(y)> 0 as y>—oo or y->+00, so that there is ‘high contact’ at either end of 
the infinite range of variation of y. 


Inverting (22) we have y = }(e@-”* — e—e-”*) = sinh (- 7). (34) 
Hence the median value of y is — sinh (y/é). 


From (33) the equation to be satisfied by any modal value of y, other than the extremities 
of the range of variation, is 


y/(1+y*) = — dy + blog [y + V(y? + 1)]}.- (35) 
From graphical considerations it is evident that there is only one solution of (35), and that 
this solution is between the median and zero. Hence when y is positive the mode is greater 
than the median, implying negative skewness, and vice versa. Since the transformation 
generating system Sy is symmetrical about y = 0, and f’(y) = (y?+1)-* is a decreasing 
function of | y|, this is to be expected. 
Figs. 9-11 show typical curves of system Sy. 





of S; 


Fror 





(29) 


(30) 


30), 


tile 
uch 
low 


(31) 


(32) 


171 
ved 


33) 


1 of 


N. L. Jonson 163 


The moments of the system Sy are determined with much greater facility than are those 
of Sz. We have I le ea 

a —t2* 9-1/ ol2—y)/8 — e-&-ys yr dz. 
Ie ail .* (e e€ dz 


Hence if r is even 
4r—1 
yy = 2-0-01E (— ap (Jeter ereosh fir— 29) 7/ay)+(— NS (7)|, (26a) 
\ s=0 8 2 hr 
i(r—1) 
andifrisodd ys, =2-*-) y (-1)4 (') elr-2s3~* sinh [(r — 28) (y/d)]. (36-2) 
s=0 


From equations (36) it follows that 

4, = —otsinhQ, 

Bs = (w—1)(wcosh 2Q + 1), 

Hts = — fo'(w— 1)? {o(w+ 2) sinh 3Q + 3sinhQ}, 

It, = Ho— 1)? {o*(w* + 203 + 3w? — 3) cosh 4Q + 4u*(w + 2) cosh 2Q + 3(2w + 1)}, 
where w=e"*, Q=y/6. 
From (37) we see that if y is positive the inequalities mean < median < mode hold, while 


if y is negative the direction of the inequalities is reversed. Also when y = 0, £, = 0 (as should 
be the case) and £, = }(w*+ 2w?+ 3). As y tends to infinity, d remaining fixed, we have 
lim £, = (@—1)(0+2)2,_ lim f, = of + 20° + 302-3. (38) 
ee Yo @ 
As y increases from zero to infinity, therefore, the (f,, 8.) point varies from (0, $(w* + 2w* + 3)) 
to a point on the S, line. 
As 6 decreases from infinity to zero, w increases from zero to infinity. Hence the (f,, 8) 


points for system S,, cover the region of the (,, 8.) plane ‘below’ the S, line, as sketched 
in Fig. 2. 


(37) 


The calculations involved not being too lengthy, fairly extensive series of values of the 
mean, standard deviation, £, and £, for distributions of systems Sy were computed. They 
are not reproduced here, but Fig. 12 is an abac based on these calculations. Using this abac 
the parameters y and 6 can be estimated, and £ and A then determined to give the required 
mean and standard deviation (cf. Burr (1942)). Given y and 4, it is not difficult to calculate 
£ and A from (37). No further tables or abacs are, therefore, given. In the case of system 
S,,, however, where ~; and o are not easily calculated, a second abac, giving ”; and o as 
functions of y and 6, would be required. 

lt may be noted that it is in the system S,, that the necessity for estimation of all four 
parameters is likely to be of most frequent occurrence. In the case of S; and S;,, € (also A 
in S,,) often has an obvious and simple meaning. & (and A) may often be fixed in advance in 
such cases. In general there is no such simple interpretation of § and A in the case of Sy. 
There is, indeed, the particular result that for the symmetrical curves of Sy, « = is the 
axis of symmetry. Otherwise, the relation of £ and A to the position and size of the curves is 
not simple. 

Examples 3 and 4 (pp. 171, 172 below) describe the fitting of curves of system Sy to 
observational data. It appears that the curves give a good approximation to Pearson Type IV 
curves. As Sy is much easier to deal with than Type IV, especially with regard to the compu- 
tation of probabilities, it seems that S,, might be used as an approximation to Type IV even 
when the latter is considered the more reasonable curve to fit. 








e =.4 = et = eS & ef S&S 2 ee 


=m. -— 8 SS 2. 


no 


+ nm © R OB Oo 


. 2 ee ee ee. SS. Oe, Sa, 








f- SLL, AL, 
LAL} 


ee a ee ee eS ee ha ae le Ue hme . T T T 


YYEYJEL 
WHTLL 
LALILFY), 








L/L 
LEELA L LLL LLL 


ee 


LLL JL pss 








> ¢ . oe 
aTPFa?r Aa 
NAN ON 


2° 
rd 
tg jo ajeog 


= 
“ 


t 
3 B 
“~ - + - = — 
cor 

+t 





Q 
AS 





50 


+65 C70 0-75 0:80 0850-90 | 1-00 


SSS SSS 
2 


» 


T 


a a 
0-45 050 055 040 0 


\—>. 
0-40 


» NS 
0-25 


~" 
9-20 





Pere 
N0-05040 045 


aU 


15 200 <—f. 


0-95 


Fig. 12 








N. L. Jonnson 165 


3-6. The transformation applied to certain Pearson curves 


Since experience has shown that curves of the Pearson system are representative of a wide 
range of frequency distributions met in practice, it is of interest to ask how far the application 
of the S,, S,; and Sy transformations to variables following distributions of this system will 
result in a transformed variable following approximately the normal law. We shall suppose, 
then, that y follows a probability law of the Pearson system and compare /,(y), £.(y) 
and /,(z), £.(z) with the normal values 0,3 respectively. 

Some of the results obtained in this section coincide with those which Aroian (1941) and 
Wishart (1947) obtained in the course of work on the distribution of statistics employed in 
analysis of variance. 


(i) S, applied to Type III 


1 
P ad Es he 
if mina, term 

then the cumulants of z = y+élogy are 

K,(2) =7+8¥(v), (2) = 8H") (r>2), (39) 

s+1 

where ‘Y)(v) = sin SX is the (s+2)-gamma function and we write Y(v) as ‘Y(v). 
saints Bz) = [FO)PLLO)P%, Bale) = 34+ FO) [FO]. (40) 


Using the asymptotic expansions for ‘Y®(v) we obtain 


B(z)=v, Pyl2) #34202 (41) 
(valid for v not too small). 

Since £,(y) = 8v-1 and f(y) = 3+ 12v-1, we see that S; does produce a variable with shape 
coefficients nearer the normal values than those of the original distribution, when applied 
to Type III variables. 

(ii) S, applied to Type VI 
I(r) 








ae, —1 —T 
then Pe REE > mage (42) 
Balz) = 3+ [Fr —v) + FON) [Pr —v) + POW) | 
If both (7 —v) and v be sufficiently large we have 
GES AE, 
£,(z) =3+ 2v-1 + 2(7 -— v)-? — 67-1, (43) 
which compare with By) = 4v-1 — 47 1+ 16(7—v)-, (44) 
By) = 3 + Gy — 67-1 + 30(7 — v) 
(iii) S; applied to Type V 
1 ; 
4 Gh sae —v+i) —ly 1 
Here p(y) Po! e (0<y<o) 


This case is similar to (ii), as is to be expected, since a Type V variable may be regarded 
as the reciprocal of a Type III variable, while S;, remains of the same form if y be replaced 
by its reciprocal. 








166 Frequency curves generated by methods of translation 
(iv) S, applied to Type I (and I1) 


bh Dv+t) y- “- 
then. By (2) = [VO(v) — PO(v +7) P [FO (v) — POR +7), 
Bx{2) = 3+ [VO%v) — YO + 7)] [VOW)— POW +7) 72. 


If v and 7 be sufficiently large then 
B,(z) = 47-2 4+ v1 —(v47)71, \ 
B.(z) =3 + 67-1? + 2v-?— 2(v47)-4, 
which may be compared with 
By) $42 + 47-2 — 16(v4+7)>, | 
B.(y) = 3 + 61 + 6y- — 30(v + ey 
(v) Sp applied to Type I (and Type II) 








Tv 
If, as before, Py) = rorey" y-l(j—y)r (0<y<)), 
then Bylz) = [V%(v) — FOr) PPO) + POPS, 
Bz) = 3+ (FOr) + PO(7)] [VOvr) + 4a) 


If v and 7 both be sufficiently large, then 


A(z) =v +74 —4(v+7)", 
£,(z) = 3+ 2v 4 + 27-4 - 6(v +7). 


These formulae may be compared with (46) and (47). 


(45) 


(46) 


(47) 


(48) 


(49) 


As would be expected, it appears that the S,, transformation generally produces a closer 
approach to normality than does the S, transformation applied to the same Type I (or Type 
II) variable. In particular, if the original variable be symmetrically distributed, S,, preserves 
the symmetry while S, does not. Table 2 below provides numerical comparisons in a number 
































Table 2 
zin S; zin Sg 
v T 
SE A, bs A, bs 

2 2 0-000 | 2-143 2-233 6-444 0-000 3-594 

4 4 0-000 2-455 1-136 4-769 0-000 3-278 

6 6 0-000 | 2-600 0-757 4-180 0-000 3-180 

2 4 0-219 | 2-625 1-384 5-243 0-131 3-625 

2 6 0-480 3-109 1-114 4-868 0-244 3-741 
I 4 6 0-051 | 2-608 0-835 4-333 0-022 3-262 











of special cases. The fifth and sixth lines of this table indicate that, as is to be expected, the 


relative superiority of S,, diminishes as the (f,, 8.) points of the Pearson curves approach 
the Type IIT line (and are hence nearer the log-normal line). Fisher’s 2’ transformation for 








the 


N. L. JoHnson 167 


| the correlation coefficient, in the case p = 0, provides an example of the application of S, 
to a Type I variable. We have, putting r = 2R—1, 


p(R) = Tee Ree-4(1— RC-)  (0< R< 1), 


(45) | r 


Z= 4 (m—8)log> +" = 4 V(n—3)log 5. 
| Hence o*(z) = 4(n—3) PO (4n—- 1), 
(46) B,(z) = 0, (50) 
Bz) = 3+ FPO(Gn— 1) [FOGn— 1). 
(vi) Sy applied to Type VII 
(47) | if PY) = oh 
then Biz) = 0, Belz) = 34+ FPOV— 4) [VOW- 9H). (51) 


Table 3 compares the f,’s of corresponding distributions of y and z for various values of v. 





(l+y)” (-w<y< +0), 


Table 3 





(48) y Bly) Biz) y Baty) Bxl(2) 





5-000 3-322 
4-200 3-245 
3-857 3-107 


5-000 
3-806 
9-000 3-466 


one ~ 
aon 


(49) 


























Oser The above discussion has been concerned only with the shape of the distribution of the 

‘ype | transformed variable, as judged by the values of £,(z) and £,(z). Even where these quantities 

rves differ appreciably from the normal values of 0 and 3, it is, however, possible that the trans- 

\ber formed probability integral could be regarded as normal for practical purposes. Further 
| investigation on this point would be of interest. 


4. CONCLUSION 


4-1. Critical summary 


described in the foregoing pages: 
(i) The systems S,, S,, Sy, together with the normal curve combine to give a variety of 
shapes of curve as wide as that provided by the systems of frequency curves in general use. 
(ii) The fundamental property that each of these curves may be transformed to a normal 
curve by a simple transformation may be regarded either as a practical convenience, or as 
a desirable property based on the arguments of Kapteyn and Wicksell. The first of these 
reasons is of considerable importance, but it should be noted that simple exact tests of 
significance are, even theoretically, obtainable only for a very restricted range of problems 
the (Bartlett, 1947). 
ch (iii) S,, is, of course, a well-established system. Of the systems S, and Sy, S, is based on 
for the simpler transformation, but S, has the simpler expressions for its moments. The fitting 


The following comments may prove helpful in assessing the value of the systems of curves 











168 Frequency curves generated by methods of translation 


of Sy seems to be more straightforward than that of S,, except when the limits of variation 
of the latter are definitely known. 

(iv) Curves of S;, S, and Sy all have high contact at the extremes of their range of varia- 
tion. This may sometimes prove a drawback at the finite limits of variation for systems 
S, and Sp. 

(v) Except for discrepancies at the ends, which may be associated with (iv), curves of 
S,, Sg and Sy agree generally with Pearson curves having the same (or nearly the same) 
first four moments. The use of the former curves may sometimes be considered simply as 
a convenient aid in calculating rough figures for subrange frequencies of the latter curve. 
In a note added to the paper by Pretorius (1930), K. Pearson suggests the use of S; in this 
capacity relative to certain Type VI curves. 


4-2. Other translation systems 


A further point of interest arises in the fact that all the moments of all curves in the 
systems S,, S, and Sy are finite. By comparison, it is known that the higher moments of 
certain of the Pearson curves can be infinite. While finiteness of moments is in many respects 
a desirable property, it may be argued that such finiteness might restrict the systems 
relatively to curves with very long tails. It may be noted that such curves might be covered 
by choosing a different distribution for z. In particular, we might suppose z to be distributed 
according to the first law of Laplace 


p(z) = 4e-*!  (—c00<z<o@). (52) 


Frechet (1928, 1939) has suggested that more use might be made of this law; his arguments, 
combined with those of Kapteyn, would lead to systems of curves defined by 


sie y+ap("=4), (53) 


with p(z) given by (52). Inserting the particular forms for f{(x — £)/A}, we would obtain systems 
St, Sz, Sy corresponding to the systems S,, S,, Sy. It is easy to show that the system Sy 
can have infinite moments. 

Clearly, by choosing fresh forms for p(z) a great variety of systems of curves could be 
constructed, but practical considerations naturally limit those cases which it is worth while 
to study. Mention may be made of the work of Olshen (1938), who has investigated certain 
transformations of the Pearson Type III distribution. 


5. NUMERICAL EXAMPLES 
Example 1 


For this example data used by Pearse (1928) were employed. The data gave the distribution 
of cloudiness at Greenwich for the period 1890-1904, excluding 1901. The last column, 
headed Type I, gives the frequencies obtained from a Pearson Type I curve fitted to the data 
by Pearse. Three moments were used in fitting this curve, the length of the range of variation 
being fixed in advance. 

Curve S;,(1), the frequencies for which are shown in the third column, was fitted to the 
observed data on the assumption that the degrees of cloudiness stated as 0, 1, 2, ..., 10 could 
be regarded as corresponding to groups — 0-5 to 0:5, 0-5 to 1-5, ..., 9°5 to 10-5. £ was thus 























iation 


varia- 
stems 


yes of 
same) 
ly as 
urve. 
n this 


n the 
its of 


pects 
stems 


vered 
puted 


(52) 


ents, 


(53) 


tems 
“yr 
n Sy 


d be 
while 
rtain 


ition 
imn, 
data 
tion 


. the 
ould 
thus 


| 
| 


} 





N. L. JoHNSON 169 


fixed at —0-5 and A at 11-0. y and é were then chosen to give exact agreement in the two 
extreme groups. These values were 


y =—0-3110, 6 = 0-25166. 














Table 4 
Degree of | Observed . : 
‘ S 
cloudiness frequencies Sa(1) B(2) Type! 
0 320 320-0 320-0 321-7 
1 129 100-9 120-9 121-5 
2 74 73-9 72-0 75-1 
3 68 63-8 57-5 61-4 
4 45 59-8 52-1 56-0 
5 45 59-9 51-6 55-2 
6 55 63-4 54-9 57-8 
7 65 72-0 63-9 65-5 
& 90 90-0 85°5 83-2 
9 148 135-4 160-7 139-6 
10 676 676-0 676-0 678-0 
1715 1715-1 1714-9 1715-0 
| 
Value of x? | a 18-44 5-76 6-52 
| 





























Curve S,(2) was fitted in the same way, except that it was assumed that the successive 
groups were 0 to 0-5, 0-5 to 1-5, ..., 9-5 to 10. € was therefore put equal to 0 and A put equal 
to 10. The values of y and é obtained were 


y = —0-3110 (as before), & = 0-19681. 


S,,(2) gives a much better fit then S,(1) and, on the whole, a better fit than the Type I 
curve. All three curves fail to give sufficiently small frequencies in the trough in the centre 
of the distribution. 


Example 2 
The data used in this example relate to the age of Australian mothers at birth of a child 
(single births only) in the period 1922-6. Pretorius (1930) fitted a Type I curve to these data. 
The values of the moment ratios (£, = 0-101, £, = 2-430) indicate that a curve of system 
’z might be fitted. In this case, however, there are no obvious values to assign to the para- 
meters £ and A. The method actually adopted (like that of Pretorius) was based on trial and 
error. It was decided to attempt a fit which would give nearly correct values for the 5, 30, 70 
and 95 % points of the distribution. Values were assigned to £ and A and then values of y 
and d obtained, so that the specified percentage pojnts were, as far as possible, unaltered. 
The process was repeated to improve the fit. Details of the working are now given in respect 
of the curve finally fitted. 
For this curve €=15:0, A= 36-5. 





170 Frequency curves generated by methods of translation 
From a cumulative diagram the following percentage points of the observed distribution 
were estimated: 
5% point 20-3 years 70% point 32-8 years 

30% point 25-6 years 95% point 40-4 years 

With € = 15-0, A = 36-5, the values of log {(x— £)/(€ +A—=)} at these points are: 
—0-7096, -—0-4192, —0-0247, 0-3918 respectively. 

The normal equivalent deviates for 5, 30, 70 and 95 % are: 

—1-6449, -—0-5244, 0-5244, 1-6449 respectively. 
Hence y and é should satisfy the four equations 


—1-6449 = y—0-70968, (i) 
—0-5244 = y—0-41928, (ii) 
0:5244 = y—0-02478, (iii) 
1-6449 = y +0-39186. (iv) 


From (i) and (iv) we obtain y = 0-5978, d = 1-2649; from (ii) and (iii) we obtain y = 0-5857, 
6 = 1-2424. For the curve to be fitted, we took the values 

y = 0:5918, 3 = 1-2536. 
The table below compares the observed distribution, the distribution corresponding to the 
fitted S, curve and the Type I distribution fitted by Pretorius. 

















Table 5 
Age of mother Sheed 
(central values fi ; Sp Type I 
‘ requencies 
in years) 

i3 3 os — 
15 191 33 46 
17 4,573 4,697 6,105 
19 21,322 22,510 22,871 
21 42,758 44,048 41,996 
23 62,620 60,994 58,455 
25 73,423 71,123 69,796 
27 74,834 74,662 75,176 
29 72,640 73,063 74,822 
31 65,182 67,476 69,637 
33 58,407 59,236 60,909 
35 48,834 49,380 50,071 
37 39,932 38,837 38,524 
39 31,050 28,491 27,485 
41 18,975 19,530 17,878 
43 11,283 10,594 10,359 
45 4,365 5,179 5,088 
47 1,072 1,619 1,943 
49 199 208 476 
51 13 2 44 

53 4 — — 

55 2 — — 
Total 631,682 631,682 631,682 


























ution 


(i) 
(ii) 
(iii) 
(iv) 
5857, 


> the 








N. L. Jonnson 171 


Neither the S,; curve nor the Type I curve fit well at the ends of the distribution. The 
S, curve appears to give the closer fit in the central part of the distribution. For the range 
with central values 17—47 inclusive, we have 

S, curve: x? = 718; Type leurve: x? = 1759. 
Excluding both groups 17 and 47 (central values), we have 
S, curve: yx? = 530; Type Il curve: y*? = 1375. 

As is almost invariably found to be the case when dealing with very large samples, exceed- 
ingly high values of x* are obtained. Differences between observation and theory, which 
may be practically unimportant from the point of view of graduation and which might not 


be picked out in samples of more usual size, are statistically significant having regard to the 
large numbers involved. 


Example 3 
The data on length and breadth of beans used in this example and in Example 4 were used 
by Pretorius (1930). In this case we shall fit a curve of system S, to the distribution o7 lengths 
of beans. The mean, standard deviation, £, and f, of the observed distribution are 
Mean = 14:399mm.; £, = 0-829; 
Standard deviation = 0-9036mm.; £, = 4863. 
Using the abac of Fig. 12 we find 


= 2-64; Q=y/d = 0-90, 





whence y = 2°38. 
x—é E- 
From (37) we calculate a( i = 1-1029, (3) = 0-5948. 
0-9036 
=a = 5 2 
Hence 05948 1-5192, 


£ = 14-399 + 1-1029A = 16-0745. 


There is necessarily some uncertainty in the determination of y and é from the chart, but 
investigation indicated that the degree of uncertainty should not seriously affect the fitted 
frequencies. Table 6 shows the observed frequencies, the frequencies calculated from the 
fitted Sy, curve, and the frequencies calculated for the Type IV curve fitted by Pretorius.* 


Example 4 


A curve of system S,, is fitted to the distribution of breadth of beans referred to in 
Example 3. For the observed distribution 


Mean = 7:9755mm.; /, = 0-1943; 
Standard deviation = 0-3399mm.; /£, = 3-6544, 


Following the same procedure as in Example 3 the following values of the parameters of 
the S;, curve were obtained: 


6= 3-55, y=213, A=0-9721, £& = 8-6195. 


Table 7 compares observed frequencies, the fitted frequencies calculated from the Sy 
curve and those calculated from the Type IV curve fitted by Pretorius.* 


* The groupings in the tails used in calculating x* are shown by the braces to the right of the table. 








172 


Frequency curves generated by methods of translation 





















































Table 6 
Length 
(central values Paes Sy Type IV 
: requencies 
in mm.) 
< 9-25 —_ 2-6 1-9 
9-5 1 2-7 2-6 
10-0 7 5-8 5-4 
10-5 18 12-1 11-3 
11-0 36 25-7 24-2 
11-5 70 55-2 52-5 
12-0 115 118-0 113-8 
12-5 199 249-3 243-7 
13-0 437 508-7 503-6 
13-5 929 970-6 968-9 
14-0 1787 1642-5 1638-9 
14-5 2294 2240-6 2229-8 
15-0 2082 2130-3 2132-6 
15-5 1129 1151-5 1181-6 
16-0 275 290-1 299-3 
16-5 55 32-2 28-5 
17-0 6 2-0 
> 17-25 — o1} ” 
Total 9440 9440-0 9440-0 
Value of x? | — 87-1* 102-5t 
- 
* Excluding the ‘over 16-25’ group, x? = 66-3. 
{7 Excluding the ‘over 16-25’ group, x? = 70-1. 
Table 7 
- we Observed 
(central values fi ‘ Sy Type IV 
: requencies 
in mm.) 
< 6-25 1-3 
6-375 4 +7) " 
6-625 10 13-8 13-3 
6-875 72 53-2 49-9 
7-125 170 182-2 177-2 
7-375 530 557-8 557-9 
7-625 1397 1394-2 1413-0 
7-875 2579 2507-0 2530-5 
8-125 2742 2757-0 2732-5 
8-375 1483 1544-4 1515-4 
8-625 400 381-5 393-6 
8-875 48 41-5 48-6 
9-125 5 2-3 
> 9-25 o1 = 
Total 9440 9440-0 9440-0 
Value of x? _ 17-47 14-36 

















— 


— ee 





h 


Mc 


wi 


w 


al 


N. L. JoHNsoN 173 


APPENDIX 
Moments of distributions in system Sp 
If z = y+ dlog {y/(1—y)} and z is a unit normal variable, then the rth moment of y about zero is 
1 i 2 
Bey) = 5 a (L+e@-8)-rdz. (54) 
This integral is not easy to evaluate directly, and values of jj, #13, 43 and #4 were obtained by the following 
steps. 


(i) For the case r = 1, the expected value 


, 


1 8) rs ‘ 
ae ge ot Ye —4z 'z— ys \- ap 
= 7 "(1 + e-G- 8) -1 dz (55) 


can be evaluated directly using a result due to Mordell (1920, 1933). To throw (55) into a form to which 
Mordell’s result applics, we make the transformation i— y = — 27dz leading to 


m= V(2m)ae-t>* | er iv? —anvt( | 4 e2mt)—I de, 





where v=—yd, p= 27d%i. 
By Mordell’s formula 
l i 2 nm 2nniviy a i? 4 gl2n -laiy 
Bh = (27) de-t* — Te fs z __ ll _— i p » q a] —. 1 
Dols Www 1+a a cll 
where q=e"¥, gq, =e-7¥ 
wo - m 
and Ooo, Yr) = X erty ednmiv — 142 > ce’ cos Qnav. 
—-@ n=1 


After some algebraic simplification, this leads to the result 


a) 9 ow 
od n(1—2yé) n 4 
jo14d° 2D ee en tn! a —— 25 Y e-ken-v'n 


, 
h 








ri = 5 a a = *S* sin (2n — 1) wy cosech (2n — 1)m8? 
= pani? 0 
2 . 2 2 
v(27) 14+2 XD e-2""2°6* cos Qnayd 
n=1 
(56) 
(ii) The higher moments can be expressed in terms of the partial derivatives of the first moment with 
respect to y. From (54) 
eC. +t eee 
ay d(2m) J _»(1+e-@-may 
r 1 2 ((L+e-@-18)— Ne“ 


~SQmJ_0 (tee riya © 


r , , 
= — 5 r — Brat). 


Hence Brat = Hi (57) 
Applying this formula with r = 1, 2,3 successively we obtain 
m= mire, 
Ms = wis 38E8 4 gar UE, (58) 
w= 1 3ga nce +e | 











174 Frequency curves generated by methods of translation 


Formulae (58), together with (56), make it possible to calculate yg, 43 and 44. Although the analytical 
expressions for these moments must be very complicated, their numerical computation is straightforward, 
though tedious. 


(iii) The computational labour may be reduced by using the recurrence formula developed below. To 
emphasize the dependence of y; on y and 6, we shall write 


1 seh s 
< =—— —-# —z—y8)—r dz. 
By, 8) an} .* (1+e-@-18)-* dz 


2 
Then B(y,8) = pe | e-#"((1 + e118) — e—e-Y8} (1 + 2-8) -r dz 
V(27) J 0 


= fya(7,8)— eb "+78 wey + 8-4, 8). 
Hence Bey + 3-4, 8) = e-9-Y Fys_s(y, 8) — wey, 8)]- (59) 

Remembering that 4 = 1 for all y and 6, (59) makes it possible for the first four moments to be 
calculated with fair rapidity for series of values of y at intervals of 3-1. 

A few of the calculated values of £, and £, are shown in Table 8. Generally the moments were calculated 
by methods (i) and (ii) for values of y between 0 and 6-!; fucther results were then obtained by means of 
(iii). It was necessary to take care to avoid accumulation of errors in applying (iii). The first sets of 
moments were calculated to eleven decimal places, and the values of /, and £, obtained should be accurate 
to the five places of decimals shown. 


Table 8 








yy o A, be hy o A, be 





0-0 0-50000 0-31396 0-00000 1-62731 0-50000 0-20829 0-00006 2-13828 
0-5 0:35227 0-29610 0-36363 2-07024 0-39797 0-20151 0-12803 2-32409 





- 
o 



































| 
| 
0-22480 0-24873 1-64723 3-65177 0-30327 0-18262 0-52856 | 2-90911 | 
1-5 0-12959 0-18679 4-59189 7-36733 || 0-22147 0-15541 1-24787 3-98260 | 
2-0 0-06767 0-12615 | 11-15751 | 15-97162 0-15546 0-12465 2-36420 | 5-71101 | 
2-5 || 0-03225 0-07717 | 26-40534 | 37-17352 ] 0-10536 0-09468 3-98326 | 8-34077 
| | | 
! 
| 5 = 2-0 
4 : —_———— ~—-—— . ‘ 
My o Ay Bs 
nee Rh Se ct ee 
0-0 | 0-50000 0-11813 0-00000 2-63131 
0-5 044125 | 0-11665 0-02084 2-66720 
1-0 | 0-38402 | 0-11235 0-08279 2-77419 
15 || 032971 | 010560 0-18409 2-95062 








| 
2-0 || 0-27942 | 0-09697 | 0-32168 | 3-19309 
| O-23304 | 006710 | 40116 | 3-49045 





to 
on 





The transformation generating the system Sz is symmetrical about y = }, so that when y = 0 the 
distribution of y is symmetrical. Positive values of y correspond to positive skewness. 

Since the curve is symmetrical when y = 0, it follows that #;(0, 6) = 4, whatever be 6. This can be 
verified from (56). Putting y = 0, we obtain 


142 5 e-n*/28° 
440, 8) = = aah 
- (27) afi $= ene] 


n=1 








wh 


B+ 


an 


w 


tc 


ical 
ard, 


To 





he 


be 





N. L. Jounson 175 


This expression can be shown to be equal to } by putting ¢ = 0,7 = 274? in Jacobi’s imaginary transforma- 
tion of the theta function 





- : t 1 
6,(t | t) = (—ir) “tei 8, (: #4 *) 
(Whittaker & Watson, 1946, p. 475). 
Also, since yo(0, 6) = 1, it follows from (59) that 


pd, 8) = e-t#-* (1 — ) = Je. 
Again Hi( 28-4, 8) = eB (1— Je-B) = eB fe, 
Similarly #4, (38-4, 6) = ed _ etd je, 


and generally, if k be a positive integer greater than unity, 


py(kd=1, 8) = (— 1) es" [a + z (- yrewe]. (60) 
(60) is useful as a check formula. 
Further interesting formulae may be obtained starting from the equation 
0 = p,(0, 5) = 43(0, 6) — $45(0, 5) + }. (61) 


Such formulae are useful in checking calculations, but do not lead to simple formulae for y3(kd-', 6), 
since #3(0, 6) is not a sufficiently simple function of é. 
We now proceed to consider limiting values of £, and /, for Sg. We can write (59) in the form 


pi(kd—, 8) = —e-¥@k-D8~* Ay’_, (k— 18-1, 8), (62) 


the forward difference, A, applying to the subscript of x’. Applying (62) repeatedly and noting that (62) 
holds for negative as well as positive values of r, we have 


r(kd-1, 3) = (—1)e-#"8™ Aku? (0, 8). (63) 
The sth negative moment of y about zero is easily found to be 


3s 8 ‘ ‘ 
By; 4) => ( Jew é~ +ys—* | 
t=0 \é 


whence, if k>r, 
e k = -.. $4. 7 ae 1 k 
pi(kd-, 8) = (— 1)ke-¥8™ [Sin 2 ) eve oe (- 0 (7) u2-00,0) J. (64) 
t=0 \ r—l t=0 t 
Hence if & is large pi kd-1, 8) = e-13~* ete—n*o-* 
ss e—Henk/d)s—1d—* (65) 
u;(y, 6) is a continuous decreasing function of y for sufficiently large values of y. Hence 


pity, d)=e-T78-"+"8-* when y is large, 


and so lim £,(y,6) = (o— 1) (w+ 2), 
yo © 
lim £,(y,6) = wf + 2w* + 3? — 3, (66) 
yore 

where w =e" 


As y increases from zero to infinity, therefore, the (f,, 8.) point moves from a point on the axis of f, 
to a point on the S;, line (cf. (14)). 
Now consider the behaviour of y;(y, 6) as 6 decreases, y remaining fixed. When 6 is small we have 


(l+e“-18)+=0 for z<y 
1 for z>y, 
“ee 
so that =a sm su= Jiam | “e-#" dz. (67) 
V Y 


As 6 decreases the (f,,8,) point therefore approaches a point on the boundary line #,—£,—1 = 0. As 
y varies all points on the boundary are covered. 








176 Frequency curves generated by methods of translation 


REFERENCES 
Aroman, L. A. (1941). Ann. Math. Statist. 12, 429. 
Baker, G. A. (1934). Ann. Math. Statist. 5, 113. 
BaRTLeEtTT, M. S. (1937). Suppl. J. R. Statist. Soc. 4, 137. 
Barttett, M. 8. (1947). Biometrics, 3, 39. 
BEALL, G. (1942). Biometrika, 32, 243. 
Burr, I. W. (1942). Ann. Math. Statist. 13, 215. 
CHARLIER, C. V. L. (1905). Ark. Mat. Astr. Fys. 2, nos. 8 and 20. 
Cocuran, W. G. (1938). Empire J. Exp. Agric. 6, 157. 
Cornisz, E. A. & FisHer, R. A. (1937). Rev. Inst. Int. Statist. 5, 307. 
Cuxtiss, J. H. (1943). Ann. Math. Statist. 14, 107. 
EnGeworts, F. Y. (1898). J. R. Statist. Soc. 61, 670. 
Epstern, B. (1947). J. Franklin Inst. 244, 471. 
FEcuHNER, G. T. (1897). Kollektivmasslehre. Leipzig: Engelmann. 
Finney, D. J. (1941). Suppl. J. R. Statist. Soc. 7, 155. 
FRECHET, M. (1928). Bull. Sci. Math. 63, 203. 
FRECHET, M. (1937). Recherches Theoriques Modernes, 1. Paris: Gauthier Villars. 
FREcHET, M. (1939). Rev. Inst. Int. Statist. 7, 32. 
FRECHET, M. (1945). Rev. Inst. Int. Statist. 13, 16. 
Gappvum, J. H. (1945). Nature, Lond., 156, 463. 
Gatton, F. (1879). Proc. Roy. Soc. 29, 365. 
Geary, R. C. (1947). Biometrika, 34, 209. 
Grprat, R. (1931). Les Inégalités Economiques. Paris: Libraire du Recueil Sirey. 
Haumos, P. R. (1944). Ann. Math. Statist. 15, 182. 
Hove uine, H. & FRANKEL, L. R. (1938). Ann. Math. Statist. 9, 87. 
Kapteyn, J. C. (1903). Skew Frequency Curves in Bidlogy and Statistics. Groningen: Astronomical 
Laboratory. 
Kapteyn, J. C. & van Uvin, M. J. (1916). Title as above. 
Kotmocororr, A. N. (1941). C.R. Acad. Sci. U.R.S.S. 31, no. 2, 99. 
McAuistTER, D. (1879). Prec. Roy. Soc. 29, 367. 


MiLNE-THompson, L. & Comrig, L. J. (1931). Standard Four-Figure Mathematical Tables. London: 
Macmillan. 


Morpet. L. J. (1920). Quart. J. Math., Oxford, 48, 329. 

Morpe tt, L. J. (1933). Acta Math. 61, 323. 

OLsHEN, C. A. (1938). Ann. Math. Statist. 9, 176. 

Pearse, G. E. (1928). Biometrika, 20 A, 314. 

Prarson, E. 8. (1931). Biometrika, 23, 114. 

Pearson, K. (1895). Philos. Trans. A, 186, 343. 

Pretorius, S. J. (1930). Biometrika, 22, 109. 

QUENSEL, C. E. (1945). Skand. Aktuar. 28, 141. 

Rretz, H. L. (1922). Ann. Math. 23, 291. 

Wuittaker, E. T. & Watson, G. N. (1946). A Course of Modern Analysis, 4th ed. Cambridge Univer- 
sity Press. 

WIcksELL, S. D. (1917). Ark. Mat. Astr. Fys. 12, no. 20. 

WituiaMs, C. B. (1937). Ann. Appl. Bic!. 24, 404. 

WitraMs, C. B. (1940). Biometrika, 31, 356. 

Wison, E. B. & Hitrerty, M. M. (1931). Proc. Nat. Acad. Sci., Wash., 17, 694. 

WisHart, J. (1947). Biometrika, 34, 170. 

Yuan, P. T. (1933). Ann. Math. Statist. 4, 30. 








pr 


tr 





nical 


don: 


iver- 


[ 177 ] 


RANK AND PRODUCT-MOMENT CORRELATION 


By M. G. KENDALL 


SUMMARY 


1. Various relations are known to hold between the rank correlation coefficient which 
I have called 7, Spearman’s rank correlation coefficient (which I shall denote by p,), and the 
correlation p in samples from a normal bivariate population. For instance, writing ¢ for 
a value of 7 in a sample of n and similarly r, for a sample value of p,, we have Greiner’s relation 


E(t) ==sinp, a) 
Esscher’s formula for the variance 


1 ee.  eeres 
vart = Jn(n—)) [) ~ (sin ‘p) + 2(n — 2) ;- (sin : ip) \| ‘ (2) 
Moran’s relation (1948) 


6 
Me) = Fine) 
and K. Pearson’s formula connecting grade and product-moment correlation which is a 
limiting case of (3), 


{sin— p + (n— 2) sin—}p} (3) 


S.. 
Py = 7 Sin” $P- (4) 


Derivations of these results, other than Moran’s, and references, are given in my Rank 
Correlation Methods (1948). 


2. In the present paper I extend these results in various directions and consider some 
practical applications. 
(a) I shall examine the effect of non-normality on equations (1), (2) and (3); 


(6) An investigation into var r, suggests that an exact expression involves non-elementary 
transcendental functions; 


(c) An expansion of varr, in powers of p, however, is obtainable, and comparison with 
K. Pearson’s formula for the variance of estimates of p based on equation (4) indicates that 
his result is incorrect; 


(d) The extension of 7 to m x n tables offers some possibility of estimating the first product- 
moment coefficient, even in non-normal variation. 


EFFECT OF NON-NORMALITY ON ¢ 
Note on the bivariate Gram-Charlier series 


3. Most studies in the past on the bivariate Gram-Charlier series consider an expansion 


of the type Di D3 
fix.y) =O(@,y)+ LY A, Oley), (5) 


+s>3 rs r! s! 
where D, = d/0x, D, = 0/0y and 


1 1 
(x,y) = In(1 —p*)! exp| — qaqa 99) |. (8) 


Biometrika 36 








178 Rank and product-moment correlation 


It is much simpler, at least for my present ieee ne use the alternative form 
f(x,y) = exp =(- — 1 ky, a | | a(x) aly), (7) 


where ala) =e ‘ (8) 
©) = Jen) 
and the summation 2’ extends over all values r + s > 3 together with the term x,,. Here the 
x’s are the bivariate cumulants and the distribution is in standard measure, so that 
Ky, = 4, = p and the terms in kj9, Ko}, Keg ANd Kg do not appear. It is easily shown that, as 
for the univariate case, the cumulative function of (7) is 
W(t, te) = log P(t, ty) 
_ Cite)? | (ite)® | or, (ita) (itaY 
ss 2! a a 
it)” (ity)® 
=-HR+2phh+A+ YD «area! (9) 
r+s>3 mS 
4. The equivalence of (5) and (7) when the constants A,, are suitably expressed in terms 
of the cumulants is established without difficulty. The essential difference lies in the appear- 
ance of x,, D, D, in the operator in (7) and the operand a(x) a(y) instead of O(z, y). Now 
exp (k,,D, Dz) a(x) ay) (10) 
has a cumulative function — }(t} + 2pt, t, + #), 
and is therefore equal to ®(z, y). Or again, if we expand (10) in powers of p (= x,,) we get the 
first differential coefficient with respect to x and to y of the tetrachoric series, which is 
equal to O(z, y).* 
5. In passing it may be noted from (7) that we have 
Di ” 
pe 
al 





(r+s>3; r,s = 1). (11) 
K. Pearson (1907) gave a particular case of this formula for the bivariate normal form, 
namely, 

Fs (12) 


Relation between t and p 
6. If two pairs of observed values are 2,, y; and x;, y;, we write 
aj; = sgn (x;—2;), 
=-1 (<2), 
and similarly b;; = sgn (y;—y;), 
=O (Y%=y;), (14) 


=-1 (y<y;). 
¢ in a sample of n is then defined as 


t= La,;b;;/{Laz; Ub? Ai 
= Xa;;b,;/n(n— 1), (15) 


* If (7) is expanded, the coefficients will not in general decline so rapidly as in the expansion of 
(5); but as both forms have the same cumulative function this does not affect the above work. 


























wh 


an 


U 


Ww 





(13) 


(14) 


(15) 


on of 








) 





M. G. KENDALL 179 


where the summation = takes place over the n(n—1) possible pairs of comparisons in the 
sample. 


Thus ¢ depends only on the signs of the values x; --x;, y; — y;. The cumulative function of 
x;,—x,; and y,;—y; is given by 
W(t, t,) = log Efexp {tt, (x; —x;) + it,(y;— y5)}] 
= log P(t, tz) + log p( —t,, —t) 


ia 
= — (9+ 2ptyty +) +25", aN (16) 





where &” extends over even values of r + s > 4. We then have at once the simple but important 
result: Greiner’s formula (equation (1)) is unaffected by skewness in the parent population 
as measured by its moments or cumulants of odd order. 


7. I now neglect powers of fourth-order cumulants higher than the first and cumulants 
of order six or more. The cumulative function (16) then becomes 


K, Kk, K, K. K, 
Vt te) = — (4+ 2phte +) +25 Nas Bae Bt, 8+ sal, (17) 


and the c.f. (characteristic function) to our order of approximation is 


P(t,, t,) = exp[—(4+2pht+4)}{1 +04+S8Q+ :| =90+“2 Bags ima), (18) 


Po pitt 
Using the unitary function sgnf = “| de 
=l1 (&> 
=0 (€=0), (19) 
=-1 (§< 0), 


we have, from ee 


ij Vij 


- a (x; —2x;) sgn (y;—y;)} 
‘i } s ie ” dadyf(x,y) sgn (x,—2,) sgn (y,—y;) 


1 d d 
| oleate |_| _-@zdalleu) exp lit (a2) +i(y.— 


- 


_ 12 dt? dl 
= =| _. “1 ing? 4)» (20) 


where ¢(¢,, ¢) is given by (18). Owing to the differentials of type dt/t we can change the scale 
of ¢, and ¢, how we like and, reducing it by ./2, we “ae 
_ 1? di, f° dt, . , Xe —— Koa 

E(t) = | og }. oi exp[— $(#+ 2pt,t,+8 {r+ Se +73 4+ 75 f+ ete i“ 12 3 a+ 3 33° 
‘a 

In this, one of the simpler cases to be discussed, the integrals may be evaluated directly in 

various ways, but to assist later generalization it is as well to systematize the evaluation 

from the start. 


> { If? ¢ d 
Writing a ot = =| = “eel }( at? + 2pt,t, + #2) (22) 


12-2 








180 Rank and product-moment correlation 


oL 1f°? ° 
we have ag ail anf 7 exp [ — }(at? + 2pt, t, + f#)] 
2 ey 
el (23) 
= ani\ af 
ee 
nd h L =—sin——... (24) 
and hence 7 Wap) 


Putting « = 2 = 1 we reach equation (1). Differentiating L twice with respect to « we find 


for the coefficient of K4. in (21) 
1 o? p 
ssa" (Jen) I. 


1 
i esha — 273 — g2)-? 
which reduces to 5 aq (P 2p*) (1 —p?)-#. 


The other terms are evaluated in a similar way, the coefficient of x3, being given by 
0? L/(da dp) and that of x2. by 0?L/ep*. Except for the terms in Ky) and Kg, we have one differ- 
entiation already carried out in equation (23). We find 


- 1 “ 
E(t) = 7 sin" p + 24n(1 pay (X40 + Koa) (3p — 2p%) — 4(kg, + Ky3) + 6pK gp}. (25) 


8. At first sight it looks as if this expression is at fault near p = 1, since (1—p*)~? there 
becomes large. But there are limits on the value in braces in (25) which prevent the whole 
term from becoming large. Consider, for example, the extreme case when p = 1. Since x 
and y are in standard measure we then have y— = 0, and hence E{e*”-} = 1. Thus, con- 
sidering the separate vanishing powers in the expansion of this expectation we have, for 
the cumulants 


r (r 
ai 1» ( ) Kjir-4 = 9, (26) 
j=0 ji 
and in particular Kay — 4K 31 + 6K gg — 4K 13 + Kay = 9, 


which is the value inside braces in (25) when p = 1. One suspects that it might be possible 
to set limits to the corrective term in (25) when p is given, but I have not reached any very 
useful expressions for the purpose. 


9. By the substitutions 


Kay = Myo — 3, Ka, = Hai — 3P, Keg = Mog — 297-1, (27) 


it is readily verified that (25) remains true when y’s are written instead of x’s. Since no 
cumulant of order higher than two requires a grouping correction, raw or corrected moments 
may be used indifferently in place of cumulants in (25). 


Effect of non-normality on vart 


10. The evaluation of vart proceeds from that of E(t?) by a development of the foregoing 
method but is considerably more complicated. We have 


E(t?) = E{Xa,;b;;}2/n2(n — 1)? 
= L{ E(a,;b;;a4.5,)}/n*(n — 1)?. (28) 





(23) 


(24) 


find 


sible 
very 


(27) 


e no 
ents 


‘oing 


(28) 


M. G. KENDALL 181 


There are three types of case: 
(i) 2n(m— 1) cases where i = k, j = 1. The expectation of each term is + 1. 
(ii) n(m—1)(m—2)(n—3) cases where 1+k, +1. The expectation of each term is then 
(a,;6,;)°. 
(iii) 4n(nm — 1) (n — 2) cases of the type when i = k orj = / but not both. This is the trouble- 
some term to evaluate. Consider the case i = k = 1,j = 2,1 = 3. Then 


1 (* dt, [® di,[® dt [® dt 
#(ay2012%3513) = = iad * it, ‘| ~ ie] a in, Plate ~n vos 


where @(t;, ty, ts, ty) is the c.f. of 2, —2 2, y; — YY, and x, —2s3, y¥, — Ys (reduced to any convenient 
scale, which throughout I determine by making the coefficient of terms in @ equal to — 4). 
This c.f. is the mean value of 
exp [tt,(X, — %) + tte(yy — Ya) + ttg(X, — Xs) + tty, — Y5)] 
= exp [i(t, + ty) &y + O(ty + ty) yy — tty y — thy yy — thgXy — tty ys]. 

When we substitute expansions of the Gram-Charlier type we may neglect those terms which 
give a total odd power in ?’s, such as #t,, for the resulting integral will vanish. Thus no terms 
in the first power of the odd-order cumulants will survive but there will occur terms of type 
K3p X30 g;, ete. To our order of approximation the c.f. becomes 


P(t ta tg, ty) = Exp(— HE+H+ G+ G+ 2ply ty + by ty + pty ty + plats + tot, + plgt,)| 


[14a (ty + ty)* + tf +} + S(t + ty)° (tg +44) + Gta + Bly} 

+5 28f(t + ts)? (t+ t,)?+ 88+ 8+ at 34(t, + ty) (tp +t,)2 + 4,84 8} 
K ] 

psa. (t, + ts) 4-8} 


Kys 9 2 
“F(t +4)? (tg+% Me — Bt} a U4 + ts) (t. +t,)° - t,  — ty 03} 


2 
> Ate + t)*— 8 —#} | |. (30) 
"a 
We have to substitute in (29) and evaluate term by term. For the term independent oi the 
k’s we have, from the known result leading to equation (2), 
: 2 (2. 1 
E sin +p) - (-sin 1) ~—* M(p), say. (31) 
11. The evaluation of the integrals may be assisted by a general theorem which I now 
prove. 
Consider the integral 
te. (“ dt, (© dt 


in a — $Xa,;t,¢), (32) 


where m is even and the summation takes place over j and j from 1 to m. We may change the 
scale as we please and hence 


17? dt, f° dt, = 
= — Tone Ba etki 33 
r — fe _" xp| - 2 Saytat | -_ 




















182 Rank and product-moment correlation 


when a is any positive constant. Then if the suffixes k, 1, ..., u, v are all different and m in 


number 3 3 1 = * 1 
(a =) I= aa | ah wf” atmexp| - sg Maastety | (34) 


Now let us suppose that a is the determinant of the form La,;t;t; which I suppose to be 
positive-definite and non-singular. Then the expression on the right in (34) is derivable 


immediately because the integral is that of a multivariate normal population and is thus 
equal to (27)! at. Hence 


0 7) 1 
A Sige st ae a 35 
(s-- a) rerqia—l) ? ( ) 
l I ue 
so that I = i day... { da,,,a-*"-»), (36) 


This is not in general a very easy way of deriving J but once J is known it gives a fairly simple 
method of deriving the integral of terms of type ¢}:t}... t?w exp [—42a,,t;t;]. Partial differ- 
entiation of (32) according to appropriate a;; gives the powers in t. 


12. Another useful device can be used for the terms in the fourth-order cumulants. 
We have 
watent 1 
u(7e5) 7 “| | OF = exp[— Halt, + ts)* + 2p(t, + ty) (to +t) 


+ Mt + ty)? + oct? + 2pt, t, + BR + af3 + 2ptyt,+ AtH}]. (37) 
Then the term in ky is the integral of the exponential term multiplied by a term 
(t, +t,)*+44+4 = H(t, +t)? +248}, 
and hence can be derived by twice differentiating M(p/,/(«f)) with respect to «. Similarly, 


the term in x,, is given by differentiation with respect to « and p. The term in Kk, is given by 
a differentiation of the form 


9 eo @ 
da0p * apt 
13. I omit the detailed working and give the results. Writing 
1 
A= S472 {(3p — 2p) (1 — p?)-! sin p + p*(1 — p?)! 
— (3p — 3p*) (1 — fp?)-# sin“ $p — Jp*%(1—}p*)}, (38) 





1 
B= — 5 {(l—p*)- sin p + p( — p?)*— 41 —4p?)-t sin bp — (1 — tp), (39) 
C= a2 (30(1— 9?) )-4sin- 1 p+ (2+ p?) (1 —p?)-! 
— 80(1 — fp*)-t sin 4p — (2 +p%)(1—4p%)}, (40) 
= =F p*) 
187? (1 —p?)?’ (41) 
a 
~ 372 (1 —p?)?’ (42) 
1 1+ )? 
~~ Bn (1 =p??? (43) 
2 Pp 


~ Qn®(1— p?)?” si 








(37) 


rly, 
1 by 


M. G. KENDALL 183 

















_ 1 [9+48p?— 1694 3p ne —p*)p" p>(3—p*) 
i =a 36(1—p%)2 161 — gpl * 4 — Jo —peyqa — 4p? * 241 —p*) (1 — Jp?)5] 
(45) 
_ 1{ —20p ‘Gar p(l+p*) 
ai pet sd—ipyn aa ERP) 
2p p(l+p??? 
‘OA cece OO 


The additional terms to be added to that of (31) are then f, say, where 
C = A(Kao + Kg) + B(Kgy + Kg) + CK gq + D(K Gq + KGg) + E(Kg9K 21 + K12K0s) 


+ F(Kg9&y2 + Ko Kog) + Gk 39K og + H(K3, + K3q) +S Ka Kyq- (47) 
We then have 


vart = -—{ E(t)? 


= ks pl t- (2n — 3) {E(t)}* + 2(n — ~{(- sin-tp) — (sin 40) +4 +¢}]. (48) 


‘+2 


EFFECT OF NON-NORMALITY ON rs 


The relation between r, and p 


n 
14. If we write a.= Ya. 
1 


i ae 
j= 


ij? b; = Dby, (49) 


then Spearman’s p, may be defined as 


3 a,b, 
i=1 
We then have 
4n(n?— 1) E(r,) = E(Za,6,) 
- E(a,b i) 
= E(Xa;;Xb,,) 
= n(n— 1) E(a,;6,;) + n(n— 1) (n—2) E(a,;b;,.) (j+4). (51) 
Now B(a,sb,) = E(t) = “sin, 
- L i“ dt, dt, tfy(2y—19)+ tly —Ys) 
and E(a,26,3) = | ae [° i % Ete } 
1 dt, aq 
== "SI FP Me bie, at 2)| 
S ett 52 
= ~ sin“ tp. (52) 
Substituting in (51) we find 
= 6 —l — —-i 5 
Bir.) = pay inte + (m2) sin bp} (53) 


as given in equation (3). 








134 


15. Consider now the modifications necessary to allow for non-normality. The cumulative 
Ys is now, as far as fourth-order cumulants, 


function of x, — x, and y, — 


Rank and product-moment correlation 


— (8+ pt,t.+8 )+ (* “20 (it,)8 4 “2 * (it)? (ita) + “32 (it,) (ity)? 


eee » (ity)? (ity) + “22 (it)? (it)? 


_— 





y 


Kos 








Sie ity) + 8 (ita). 


A few terms in this cancel. We may neglect those which give odd orders in the total power of 


— bk go( tty) — §Xog(tte)* 


the t-terms. We find then for the c.f. (reducing the scale by ,/2 in the usual way) 


exp[— 4(#+ pt,t,+8 Ny +73 


Kao 94 4 X31 ay, 4 “22 yoyo 4 “194 pa 4 


24 16 24 


Koa 4 
— Oe 
48 


On integrating over ‘ciety (dt,/it,) in the usual way, we find 


E(r,) = war —{sin-1 p+ (n—2)sin |p} 


+1) 
6(n — 


3 ga(Kay lily + Kyaly ay}. 


= Sar tt wcll. ities 693 + Kg) + Mog + N (x3, + 2g) + Pk i2Ko3}; 


where K =— 


1 
L= —Toy (1-30), 


oa £4) age 


N= 


P= 


3P () _452)-4 


all + 2(4p)"} {1-307} '. 


[(3(4p) — 2(4p)9} (1 — de®) 


(59) 


(60) 


16. It is instructive to consider how far the limit of equation (55) for large n can be 
derived by an extension of K. Pearson’s original method. A comparison of the two approaches 
well illustrates the advantages of the use of the characteristic function in the bivariate Gram- 


Charlier expansion. 


If f(x) and f,(y) are the border-frequency distributions, the grades £ and 7 are defined by 


« 


E= [" Jie) ae y= [" fanay. 


Each has a mean of $ and a variance of ;';.- or the grade correlation we then have 


Pa = 


where f is the frequency function f( 


Po _ 


op 


(x,y) and hence 


= 12° } ene dedy, 


12| { En fdxdy — 8, 


(61) 








In vi 


This 
the 
freq 


whe 
NY 
mat 


1 8 


tive 


r of 


a owe 





M. G. KENDALL 185 


In virtue of equation (12) this becomes, after partial integrations with respect to z and y, 


roo St 


= 12)" [” perp fee. dedy. (63) 





This result is generally true when p is the first-product moment «,, entering explicitly into 
the frequency function. Pearson substituted for f,, f. and f in (63) in terms of the normal 
frequency functions and integrated to obtain 


OPg _ 3 
sae 64 
dp m(i—4p?)t? ” 
whence equation (4) follows. 
Now with the notation of §3, but writing £ for a(y) we have to our usual order of approxi- 


mation: 


K 3 
A> (: ae Sess +50 t+ Dt), 


fr = (: —“w pas i+ S08) B, 


6 = "79 


K. a Kas a. ie kK 
f= (1 — — Di —* Dj D, —¥ D, D}- — D3 + = Dt 4 = L3D, 


, 2 
+“ D? D3 + ~ D, D3 + 3 Di+5 ( Di+ “a Dz D, Ph 5 D3+ “ss 3) \o. 
- ) 
When we substitute in (63) we obtain for the integrand a series of products of derivatives of 
a, Band ®. By partial integrations we can convert these into products of ® and partial deri- 
vatives of a and f. It is then found that terms involving «3, and Kg, vanish. We are left with 


1 +50 Di +“ DED, + "2 2 2 jDi+ Dp, D3 


+8 pty py Dp 1X12 9 pay “pe psl ap. (65) 


These expressions can be evaluated term by term. A comparison of (65) with (54) shows how 
the various terms correspond. The numerical coefficients in the latter are } of those in the 
former for «’s of the fourth order and § for «’s of the sixth order because the exponential in 
the integrand in (65) is on a scale |/2 times that of (54). 


THE SAMPLING VARIANCE OF Ys 


17. To find the variance of r, we have to evaluate #(r3), and here a novel point appears. 
We have 
{3n(n?— 1)}* B(r?) = E(Sa;b;)? 


’ . . . ° 
= KY YG; Dn)? 
k 


'¢ 


= BLY 45). 4 ,6;,} + ELL bn dy by) (+0. (66) 








186 Rank and product-moment correlation 


One term contributing to this sum will be H(a,;b;,a,,6,;), where the suffixes with different 
letters are not equal. The c.f. of a typical term is the mean value of 


exp [t¢,(%, — X_) + ite(yy — Yg) + ttg(%q — Xs) + tty(Yg — Yo)], 


and is exp[— 4(4 +++ + pt, t.+ pt, t, + plgt,)). (67) 
The expectation of the term is therefore proportional to 
1 [*? dt * @& 
= —... —exp[—}(+8+84+6+ pt, t.+pt,t, + ptst,)]. 
—oo ty —@ ts 


If we differentiate twice with respect to p, one part of the resultant integral will be 
4 

m*(1— $p? + yep*) 

Now this cannot arise from the differentiation of an ‘ordinary’ mathematical function. The 


first integral, in fact, is elliptic. It appears, therefore, that varr, and higher moments of 
r, will depend on non-elementary transcendental functions. 





=| [dt exp[— (G+ ++ ti + pt, t, + pt, t, + ptst,)] = (68) 


18. Consider the terms contributing to (66). Remembering the symmetry of some of 
the expressions in a and 6 and with the convention that all suffixes with different letters 
are different, we have the following types as far as terms of order n°: 


Type Number 

455 5:% Aim Dip n(n — 1) (n— 2) (n—3) (n—4) = rn 
5; A1m, Opp ; n® 
A bi, Gym Ory 2n® 
nu Gin Ov, 2n™ 
ij Oi 8m Onn 2n®) 
giome: bx 2n®) 
05505. Om Oy 2n) 


To order n* the number of terms sums to n* — 4n° which is correct because the total number 
of terms is n*(n — 1)*. We then find, to order n-}, 
varr, = E(r2) —{E(r,)}* 
* war ppal{(n® — 16°) Bas dan big) + 2° H(A; Dip) + 2° Bb an bry) 

+ 2nP EA 55b A y_ Dyy) + 20> BG 55 B54 Ayn Opn) + 2NPE (O55 554. Qn Oye) + 2N5 L(A; 5b 54. Ay, by)} 

—n?(n — 1)? {B(a;;6,;) + (n— 2) E(a;5b;,)}*) 
[ — 9B (445 b5.)}* + B(G55b jp Oem Diy) + 4£E (4456 2m Ory) 

+ 2E (A155 5A 4q by.) + 2E (455 55m by5)]; (69) 
since E(A:6 5% Oy) = L(A; 524m ,,), 


in virtue of the symmetry between a and 6. 


: 
n 


The expectations may now be found as a power series in p. For example, to find 
E(a;;b2%m bp), we have, for a typical term a,36,, 42,695, a c.f. (reduced to scale) of 
exp{—HA+Q+ G+ G+ ply ty — plats —tyt, + ptst,)}. 
If the exponential in p is expanded we obtain integrals of type 


dt 
(—4p)s ‘i ao 7, ito — tats + tyt,)® exp {—4(4+8+8+ 4—-t,t,)}, 


——— 





fferent 


(67) 


(68) 


n. The 
nts of 


yme of 
letters 


amber 


im b,)} 


(69) 


» find 





M. G. KENDALL 187 
which can be evaluated term by term. I find, as far as p*, 


Hioubatmte)~ (5) /3[3(5) *a(8) *iae(a) *ios(s) | 
Blagbatenby) = (5) [55+3(a) tst(3) *i315(3) *aeria(a) > 


lem 





Similarly, 








E(a;5654 4m Oy.) = 


























2\? 4 379/p\* 859 , 
ee Laake i 
2 2\? 1/p 8 ey py 
x2 ‘Ny 4 cs he 
{E(a;;b;x)} = (= sin 4p ) = (5) | (6): als y+ + ;) tals |. (74) 
On substituting in (69) we find 
varr, = “(1 — 1-563,465p? + 0-304,743p* + 0-155,286p* + 0-081,437p!). (75) 


19. Now K. Pearson, in his discussion of grade correlations, gives a result which is 
equivalent to 


1 
varr, = = (1 — 1-666,5507p? + 0-433,6130p! + 0-161,8337p% + 0-049,5042p*). (76) 


Without having given the matter much thought I had expected that (75) and (76) would 
agree, just as (3) and (53) agree in the limit. A check of my own formula having failed to reveal 
any error, I re-examined Pearson’s, with some rather interesting results. Pearson begins 
with the general large-sample formula for the variance of a product-moment correlation 





varr = p* [H2e 1 Maa 1 Ha Hos _ _fa_ - _/43 | ; (77) 
Mei 2 Heofoz 4/0 4/O2 Hal eq Fiat o2 
Reducing this in virtue of certain symmetries and putting 
Heo = Mon = 1s) Mao = Mor = 80> Pa = 12p,, 
1 {foe . ° 
we have varr =— [ee + $p5)- 2p fx +e). (78) 
n (1050 139 


Pearson then evaluates //.. and 43, by some characteristically pertinacious mathematics and 
arrives at (76). I cannot find any material error in his work.* 


But it seems to me that the use of equation (77) is itself an error. The large-sample formula 
is derived by writing 


ay - 
ep 9 
ds (2259 Moo)? a 
dr _ amy, dmg _ldmg (80) 
r My, 2 my 2 me,’ 


and proceeding in the usual way by squaring, summing and substituting for the various 
product-moments in terms of known parent-values. (The formula is derived as an example 
in my Advanced Theory, vol. 1, p. 211.) In short, the formula allows for sampling variation 


* There is a slight arithmetical error in the coefficient of p> in the expansion of 2p,, which is given by 
Pearson as 0-009,2650, whereas the correct value is 0-008,9525; but this is quite unimportant. 


















188 Rank and product-moment correlation 
in the variances mg . and mo., whereas in the calculation of grade correlations the sample 


1 
variances are always constant and equal to —, in? 5 (n?— 1). If m9 and mg, are constant then the 


variance of r as given by (79) is simply 
toa 
vary = —/22— Fn (81) 
Nn L2oo2 
and a comparison with (78) shows the importance of the difference. 

20. Even (81) does not agree with my equation (75). The reason is that the large-sample 
theory does not allow for the appearance of expectations of certain terms with tied suffixes. 
But it is interesting to observe that the moment /., does appear as one of the expectations 
in the exact result. 

In fact, for a grade £ we have, from the Fourier inversion formula, 

; af? i-c* 
E-t=5 |] —, oe (82) 
where ¢ is the c.f. of the normal function a(x). In our convention as to the limiting value of 
the integral at zero we may write this as 





Thus 
Hae = E{(E— 4)? (n — $)*} 
l © dt, - dt, 
- Qn |= ae exp (—it, x —it,x — tt, y — ityy) A(t) d(te) A(ty) H(t) d®. 


The integral with respect to x and y is the c.f. of (¢, + ¢,) and (é,+¢,) and hence is 
exp [ — 3{(t, +t)? + (ty + ty)? + 2p(t, + ta) (tg + t,)})- 


Hence 
2? ° dt, e 2.9 24 724 724 72 
a= Gai a og OPI Ht P+ (ta tt)? + lh +) bo te) +A H+ 
= Pha; jb. 45s), 
Likewise iy _ = 9 E(a,;b;,)}°, 
/30 
€ 
so that in (81) varr = > [E\ (45554478 in) — (L(G; 5,)}" I. (84) 


Now if we compare this with (69) the matter becomes clear. In the latter there are certain 
tied suffixes, and if they did not appear, e.g. if we had E(a;;b,).a),,6,,) instead of E(a2b;,.dj»,byp), 
the total coefficient of {H(a,;b,,)}* would be -9+4+4+2+2 = —1, and the agreement would 
be complete. The more exact method distinguishes classes of case which the large-sample 
formula treats as the same. 


ESTIMATION OF THE FIRST PRODUCT-MOMENT 
21. The relative insensitivity of H(t) to departure of the parent population from normality 
raises the question whether the statistic sin }7¢ would provide a good estimator of the first 
parent product-moment even in non-normal cases. £(r,) is more sensitive, but here again the 





we 


Th 


an 


neo > w@w 





1m ple 


on the 


(81) 


ymple 
fixes. 
itions 


(82) 


lue of 


+E} 


(83) 


(84) 


rtain 
n bin ) ’ 
rould 
mple 


ality 
first 
n the 


M. G. KENDALL 189 


statistic 2sin i7r, is a possible estimator. Neither, of course, will be as good as the actual 
sample product-moment r where such a statistic can be computed; but there is a class of 
case, intermediate between the ordinary bivariate frequency table and the contingency 
table, in which ¢ and r, can be found but r cannot, namely, the case in which the rows and 
columns are arranged in some order although ranges of a variate are not assigned to them. 


The 2 x 2 table 


22. It has been pointed out by Whitfield (1947) that when due regard is paid to ties the 
coefficient 7 for a double dichotomy is the same as ,/(x?/n). Writing the 2 x 2 table in the form 














a b a+b 
c d |c+d (85) 
a+c | bid n 


ad — bc 


we have os {(a+b) (c +d) (a+c)(b+d)}*" 





(86) 


This expression has one very interesting application. Suppose that a distribution is normal 
and is dichotomized at its medians so as to give 





a — in 
in—a | «= dn (87) 
ee _— ——— 
jn | jn n 
cies.’ 
Then ‘= Un mre 1. (88) 


Now the distribution is antisymmetric about the medians of the two variates. Consequently 
any loss of scoring in the calculation of 7 due to grouping is zero, because any loss due, say, 
to grouping in the top left-hand cell is offset by an equal and opposite gain in respect of the 
bottom right-hand cell. Hence (88) is exact in the sense that this value would be arrived at 
for median dichotomy even if there were no grouping. Hence, since E(t) = 7 in the limiting 
case, we have exactly 


‘ 4c 
p =sin in( _ i) = cos an. (89) 


This is Sheppard’s theorem (1898) on median dichotomy. 


23. If we calculate p, for a 2 x 2 table we arrive at the same value as for 7, with due allow- 
ance for ties in both cases. This is evident from the consideration that in rankings of two 
p,and 7 are equivalent, and it may also be verified directly from (85). In fact, the denominator 
entering into p, is (Kendall, 1948, p. 29) the product of two factors, one of which is the 
square root of 


§(n? —n) — b{(a +b)§— (a+b) + (c+d)>— (c+d)} = §n(a+b) (c+), 


and the other the square root of 4n(a+c)(b+d). The numerator is 4n(ad— bc), and hence 
p, is also given by (86). 











190 Rank and product-moment correlation 


24. For the 2 x 2 table we then have the choice of the two estimators sin 47t and 2 sin 3zr,. 
They are not in general equal, and, in fact, for small ¢ one is about 50 % greater than the 
other. This is a little disconcerting until we remember that p, for a ranking of two is a very 
rough measure of relationship and may be seriously affected by grouping,* whereas ¢ is not 
much affected when the dichotomy is near the median and relates to a normal population. 
It thus appears rather unsafe to use either estimator if the dichotomy is far removed from the 
median; and in the contrary case sin 47 is probably better than 2 sin jzt. 


Example 1. Pretorius (1930) gives a number of distributions varying from the markedly 
skew to the nearly normal, together with their moments up to and including those of the 
fourth order. 

Consider first of all the corrective factor in equation (25). Using Pretorius’s values, I find 
for this factor the following values (correlation data from his Tables I, II, III and VI, 
respectively): 








Corrective factor 
Parent p ma 
for non-normality 
Marriages 0-7082 — 0-0597 
Parents ‘ 0-7349 0-0463 
Barometric heights (full year) 0-5807 0-0129 
Beans 0-7811 — 0-0095 




















The corrections are reasonably small. 


I then grouped the distributions so as to give dichotomy as near the median as possible, 
finding 
































Marriages Parents 
87,325 58,019 145,344 276,553 | 53,948 | 330,501 
ET SOMERSET) NENT PSE in gt 
34,793 121,648 156,441 75,811 225,370 | 301,181 
122,118 179,667 301,785 352,364 279,318 | 631,682 \ 
Barometric heights Beans 
10,384 4,251 | 14,635 4,161 | 517 4,678 
a | pms | 
4,250 9,970 | 14,220 1,680 3,082 | 4,762 ; 
| | | 
14,634 | 14,221 | 28,855 5,841 3,599 | 9,440 





2 
* If we put n = 2 in (53) we get E(r,) = ee which agrees with the result for ¢. For small n the 
formula 2 sin §7¢ is badly biased and this affects the calculation for a 2 x 2 table. 





in, nt wi: ene 





1g7r,. 
in the 
. Very 
is not 
ation. 
m the 


kedly 
f the 


I find 
d VI, 


sible, 


n the 





M. G. KENDALL 191 


As the correlations in all four cases are high I took two further examples quoted in my 
Advanced Theory, vol. 1: (p. 27) Tocher’s data for cows (r = 0-2189) and (p. 324) Koga and 
Morant’s data for highest audible pitch (r = — 0-6136). The resulting tables were: 














Cows Audible pitch 
1,407 1,078 2,485 809 1,383 2,192 
881 1,546 2,427 799 388 1,187 
2,288 2,624 4,912 1,608 1,771 3,379 














I then find for these six tables: 








Distribution Product-sncunemt sin }at 2 sin 4at Tetrachoric r 
correlation 
Marriages 0-7082 0-5688 0-4006 1 
Parents 0-7349 0-7982 0-6064 > 0-95 
Barometric heights 0-5807 0-6013 0-4268 0-65 
Beans 0-781] 0-7630 0-5609 0-05 
Cows 0-2189 0-3145 0-2130 —0-15 
Audible pitch — 0-6136 — 0-4408 —0-3031 -1 























The agreement between r and sin $7t is no more than fair, and that between r and 2sin {zt 
is much worse. The distribution of marriages is very leptokurtic, but this by itself should 
not account for the poor correspondence in that case. The main reason, I think, is the skew- 
ness of the distribution which, though not affecting the correction for non-normality, brings 
about a sweeping amalgamation of rows and columns on one side of the median but not on 
the other, so that the effect of grouping is substantial and one-sided. I give also some approxi- 
mate values of tetrachoric r, which are so bad that more refined calculation is not worth 
while. Our methods are at least a great improvement on tetrachorics, which seem to be 
extremely sensitive to departure from normality. 


mx n tables 


25. In the more general case of an m x n table (m or n greater than 2) we might expect 
better results, less information being lost by grouping. The calculation of ¢ can be extended 
to such cases without much difficulty. (That of r, approximates to the calculation of r itself 
for grouped bivariate data and need not be separately considered.) The method is best 
illustrated by an example. 


Example 2. The table on p. 192 shows a 4 x 4 grouping of the bean data referred to in the 
previous example. ’ 

No score results from the number 332 in conjunction with any members in the same row 
and column, but a positive score results from all the other cells. Thus, for instance, the score 
resulting from the cell in the second row and column is 332 x 5,550. The number in the first 












192 Rank and product-moment correlation 


row and second column, 121, gives a positive score with members lying to the right and 
a negative score with those to the left; and so on. The total score is 
332(5550 + 126 + 0+ 1420+ 642 + 30+ 1453432) 
+ 121(— 1128+ 1264+ 0—5+ 642+ 30—0453+432)+0+0 
+ 1128(1420 + 642 + 30+ 1+ 53 + 32) + 5550( — 5+ 642 + 30 -—0+ 53 + 32) 
+ 126( —5— 1420+ 30—0-—1+432)+0+ 5(14+ 53 + 32) 
+ 1420( — 0+ 53 + 32) + 642( —0— 1+ 32) + 30( —0— 1—53) = 9,175,210. 























332 121 0 0 453 
1,128 5,550 126 0 6,804 
5 1,420 642 30 2,097 

0 1 53 32 86 
1,465 7,092 821 62 9,440 





The divisor is given by the square root of the product of two factors derived from row and 
column totals. 


U = 4(9440?— 453? — 6804? — 20972 — 862) = 19,104,585, 
V = (9440? — 1465? — 7092? — 8212 — 62?) = 17,996,513. 


t 9,175,210 
a J/(19,104,585 x 17,996,513) 


sin mt = 0-7013 (against r = 0-7811). 


Hence 





= 0-49483, 


‘The estimate in this case is worse than that given by the 2x 2 table, but we might have 
expected this result from the nature of the grouping. 
The arithmetic of determining the score may be systematized. Taking the above table as 


an illustration, we form for each row the sum of the members lying to the right of a particular 
cell less those lying to the left, e.g. 


121 — 332 — 453 — 453 
5676 -—1002 -—6678 —6804 
2092 667 -—1395 — 2067 

86 85 31 — 54 


The process is repeated for this table by operating on the columns, giving 


7854 -—250 -—S8042 —8925 
2057 1084 —9il —1668 
— 5711 1419 7162 7203 
— 7889 667 8526 9324 


The score S is then one-half of the sum of the products of these numbers by the corresponding 
cell frequencies in the original table, i.e. 


28 = (332 x 7854) + (121 x — 250) + etc. + (32 x 9324) = 18,350,420. 








t and 


2) 


7 and 


have 


le as 
cular 


ding 


M. G. KENDALL 193 


As a matter of interest I worked out the value of ¢ for the full 16 x 12 table of the original 
bean data, though this involves more arithmetic than one would want to spend on such work 
in practice, and found ¢ = 0-6243, sin 47¢ = 0-8309 against r = 0-7811. The 4 x 4 table used 
earlier in this example is a condensation of the original table obtained by amalgamating 
consecutive sets of fours by columns and consecutive sets of threes by rows. 


26. To sum up, it appears that the use of equation (1) may be moderately reliable for 
grouped data but is not very accurate as providing an estimator of the first product-moment. 
A good deal depends on the nature of the grouping. It is possible that further research may 
provide grouping corrections which will improve the estimator, or that rules may be dictated 
by theoretical considerations which govern the optimum grouping for ranked data to permit 
of the estimation of product-moments. 


I am indebted to Mr. A. K. Gayen, who read this paper in typescript, for cailing 
my attention to the facts that the order of magnitude of the coefficients in the expansions 
of (5) and (7) has been considered from the point of view of elementary errors by C. E. 
Quensel (1938, Lunds Univ. Arsskr. N.F. 34, 4, 1) and that E. C. Rhodes (1925, Biometrika, 
17, 318) investigated the effect of non-normality on K. Pearson’s formula—equation (4)— 


though his results appear to be inexact owing to the retention in the final formula of 
terms of the same order as some which have previously been neglected. 


REFERENCES 


KENDALL, M. G. (1941). The Advanced Theory of Statistics, 1. (Fourth edition, 1948.) London: Charlss 
Griffin and Co. 

KENDALL, M. G. (1948). Rank Correlation Methods. London: Charles Griffin and Co. 

Moray, P. A. P. (1948). Rank correlation and product-moment correlation. Biometrika, 35, 203. 

Pearson, K. (1907). On further methods of determining correlation. Drapers Co. Res. Mem. 
Cambridge University Press. 

Pretorius, S. J. (1930). Skew bivariate frequency surfaces, examined in the light of numerical 
illustrations. Biometrika, 22, 109. 

SHEPPARD, W. F. (1898). On the application of the theory of error to cases of normal distribution 
and normai correlation. Philoc. Trans. A, 192, 101. 

WHITFIELD, J. W. (1947). Rank correlation between two variables, one of which is ranked, the other 
dichotomous. Biometrika, 34, 292. 


Biometrika 36 13 











| 194 ] 


TESTS OF SIGNIFICANCE IN HARMONIC ANALYSIS 
By H. 0. HARTLEY 


1. INTRODUCTION 


The classical harmonic analysis and closely related periodogram analysis have, in recent 
years, been the subject of severe criticisms (Kendall, 1946a,b). These have been mainly 
of two kinds: 

(a) That the analysis has been widely misused in situations where it is not appropriate 
and has thereby led to faulty conclusions. 

(b) That the tests of significance used are based on the assumption of a random series and 
are therefore hardly ever applicable. 

In this paper we do not wish to deal with (a) except to emphasize that it is the misuse of 
a good tool that should be criticized. Harmonic and periodogram analysis have, of course, 
been useful in their definite but restricted fields of applications. When a reasonable theory 
suggests that the systematic component of the data is composed of a moderate number of 
sinusoidal terms, such an analysisis appropriate. Asexamples, we may mention here investiga- 
tions into instrumental error and resonance behaviour, analysis of sound tracks and of non- 
centralities in the surface of circular machine parts, numerous problems in astronomy and 
meteorology and many others. On the other hand, it would be inappropriate to use periodo- 
gram analysis when determining the period of what is suspected to be a heavily damped 
vibration, simply because the dominant harmonics in the Fourier expansion of such a vibra- 
tion bear no relation to its frequency and the higher harmonics required for its representation 
will not become apparent. The procedure is just as inappropriate as would be an attempt to 
obtain, say, a quadratic regression by harmonic analysis. 

It is equally inappropriate to use periodogram analysis in a search for ‘periods’ vaguely 
defined by verbal descriptions, usually of a kind to suggest that a ‘period’ is a time interval 
for which the serial correlation is high. It is not surprising that ‘periods’ so defined are best 
determined, if they must be, by the correlogram. Or, again, if we suspect the series to be 
generated by an autoregressive scheme, harmonics are unsuited for the estimation of its 
parameters. 

However, in this paper we want to deal with (6), and here we should quote Kendall’s 
(1946a) summary of his discussion of the Schuster, Walker and Fisher tests: 

‘All the tests we have described are based on random normal variation in the original 
series; but in practice nobody would embark on the labour of a periodogram analysis unless 
he had satisfied himself that the data were not random. It seems to me therefore that these 
tests are really off the main point, being tests based on a hypothesis which we have already 
rejected. They are not without their usefulness, however. We may assume with some con- 
fidence that if a particular intensity in the series is not shown as significant on the hypothesis of 
random variation, it is not significant when the series is systematic. What does not follow is 
that if one intensity is significant, then others must be so even if they exceed the significance 
values; for they are not independent of the significant value, at least for short series. What we 
ought to do perhaps is to extract the component which is considered significant from the 








ecent 
ainly 


riate 
s and 


ise of 
purse, 
heory 
oer of 
stiga- 
"non- 
y and 
‘iodo- 
mped 
ribra- 
ation 
ipt to 


guely 
erval 
> best 
to be 
of its 


dall’s 


iginal 
unless 
these 
ready 
> con- 
esis of 
low is 
cance 
iat we 
m the 


H. O. HartLey 195 


series and then analyse the remainder; and so on as long as significant terms appear. But this 
is hardly a practical computational possibility. Tests of significance in the periodogram, as 
in the correlogram, remain undiscovered.’ 

Kendall’s main point, therefore, is that the ‘significance’ of an observed intensity may be 
‘misleading’ in that it does not necessarily indicate the reality of the period for which it 
was observed but, possibly, that of other periods or, indeed, quite a different non-random 
behaviour of the series. 

Whilst it would appear to be difficult to develop a test entirely free from the above criticisms, 
we will show that most of the difficulties can be overcome by using tests in which the periodo- 
gram intensities are independent or slightly correlated, and by applying some recent results 
in analysis of variance test distributions. 

We shall fix the ideas by developing tests first under very restrictive assumptions, but will 
proceed step by step to'more general hypotheses. 


2. TEST FOR THE MAXIMUM HARMONIC INTENSITY 


Let y, (¢ = 0,1,...,2—1) be a series of n observations at equidistant intervals of the in- 
dependent variable zx. Without loss of generality we may assume that y, was observed at 
2, = 2nt/n. Harmonic analysis of t‘s series consists in the fitting of, say, 2m + 1 regression 
coefficients 


a a ee 
in the form of the regression function 
m 
Y,=a)+ >& a, cosity + 6;sin ity, (1) 


i=1 
where y = 27/n. 


It follows from standard regression technique and from the exact orthogonality of the 
trigonometric functions that 
17-1 9n-1 n—-1 


%=- DH=%, @&=- Dy cosity, b=- > y,sintty, (2) 
nN t=0 N t=0 N t=0 


from which we compute the ‘observed intensities’ S? = a?+ 6? corresponding to period ¢. 
Moreover, if m < 4(m— 1) the residual sum of squares is given by 


n—-1 n—1 m 
R= & (u—K? = Y (y- HP In B (ai +O}. (3) 


In order to test the significance of the harmonic term with the largest intensity S?,,.., 
we start from the hypothesis of a completely random series, say ,,H), i.e. we assume 

milo: Y, = &» Where the ¢, are independent normal deviates with variance o°. 

Under this hypothesis, the m intensities 4nS? (¢ = 1,2,...,m) are all independent yx? 
values, each based on two degrees of freedom. The chance for the largest intensity to exceed 
a given level x, say, is therefore given by 

1—(l—e-#*)™, (4) 
which is Walker’s (1914) criterion. Fisher (1929) has obtained an ingenious exact test for 
the case m = }(n— 1), n odd. 

Here we prefer to convert Walker’s criterion into an exact test by making use of the residual 
(3) as an independent estimate of o? and then to ‘studentize’ (Hartley, 1944) Walker’s 
criterion. The resulting test is one for the maximum variance ratio 


Frnax. - jn Strax(n— 2m — 1)/R*. (5) 


13-2 








196 Tests of significance in harmonic analysis 


To find the probability iniegral of F,,,, let ¢,(s) denote the distribution of a sample standard 
deviation based on v degrees of freedom; then the chance of Finax, < F* is given by 


P(F*) = | * Gals) (1 —exp{—s*F*})™ ds, (6) 


where v = n—2m-—1. 
This integral was evaluated by Finney (1941), who found 


2rF ‘)” 
= ; 





POPS) = 3 (—1y"G,(1+ (7) 


Earlier the present writer (Hartley, 1938) suggested a method of approximating to the 
integral, resulting here in the simpler formula 
P(F*)={1—(1+2F*/vy »}", (8) 

which is discussed in detail by Finney. A further approximation which we use in the examples 
of §7 is as follows: 

Instead of evaluating the upper 100a % point of the distribution (7), use the upper 
100a/m °% point of the F distribution based on 2 and v degrees of freedom. This approximation 
is, of course, only valid for upper percentage points. 


3. THE POWER OF THE F,,,,. TEST 


In order to deal with Kendall’s objections against the use of criteria based on random series 
y,, we must investigate the behaviour of the F,,,. test under alternate hypotheses. This is 
best linked with an examination of its power function; the appropriate set of alternate 
hypotheses is as follows: 

mfl;: The series y, is composed of a systematic harmonic series Y,and a random remainder 
€,, i.e. we have - 
y, = Y,+€, = Agt+ = (A, cos ity + B;sin ity) + €,, 

fa 


where vare, = o*. Of the m periods i = 1, ...,m, some (say /) have positive amplitudes and 
the remaining m—I have zero amplitudes. More precisely 
A?+B*>0 for (t) in (4), 
where (k) is the subset of positive amplitudes in the total set of m periods; 
A?+B?=0 for (¢) in (h), 
where (h) is the set of m—I ‘zero amplitude-periods’ complementary to (k). 

Under this hypothesis, R? is again distributed as x? with vy = n — 2m — 1 degrees of freedom, 
the intensities }nS}, are again independent x? values with 2 degrees of freedom, whilst the 
intensities 4nSj, are independent non-central x? values (e.g. Patnaik, 1949) with non- 
centralities given by Ay, = 4n(AB-+ Bi)/o?. 

Patnaik has shown that the distribution of a non-central y? having a non-centrality A 
and based on n’ degrees of freedom is, to a close degree of approximation, given by the 
distribution of px2,, where x2, is a central y* based on v’ degrees of freedom given by 
, _ (n'+A?? 

~ nm +20? (9) 
_ n'+2Aa 
~ +A 





whilst the scale factor p is given by 





(10) 








To 


lev 
aif 


wh 


It 


i ir ae.) a) ae ee Ue 





ndard 


(8) 


mples 


upper 
ation 


series 
‘his is 
rnate 


jinder 


s and 


} 


H. O. HartTLEy 197 


To a close approximation, therefore, the probability, say p,(x), for nS} to be below a given 
level x is equal to the probability of x? based on vy, degrees of freedom being smaller than 
x|pz,; 1.€. ome 
D(x) = aT) | er. dr, (11) 
0 
where, from (9) and (10), 


a (2+ 325(48+ BY) / (2+ * (i+ BD) ; 
(12 
pe= (245 4t+ Bp)/(2+ Scat 0). | 


It will be noted that v, and p, depend on the non-centralities (Aj + Bz)/20*, and these para- 
meters must be known in order to evaluate the power of the test. 
The chance for all the m values of 4nS? to be smaller than z is then given by 


P(x) = (l—e 4 il P,(2). (13) 


We now denote the probability for ./Fnax, = $./nS;./v/R to fall below a given level 
(./F*, say) by P(,/F*). This integral is then given by the ‘studentized’ expression of (13). 
Using p(£) = p(2&?), we have for the first two terms of the ‘studentized’ integral (Hartley, 
1944) 


PU F*) = ply F*)— 7 (yF*p' — Fp"), (14) 


where p’ and p” are the first two derivatives of p. The power of the F,,, test can therefore 
be computed from (14) as 1 — P. This is the chance of establishing significance, using the Fug, 
test, when the series is composed as defined by ,,H; (p. 196). Its evaluation for specified 
values of the (Aj. + Bj)/o*, although laborious, is quite feasible. Numerical examples are 
given in §7. 


4. THE CHANCE THAT THE F,,,, TEST IS MISLEADING 


We now use the above expression of the power function to examine the frequency with which 
a significant result might be ‘misleading’ in the sense of the above criticism made by Kendall. 
Strictly speaking, if the maximum observed intensity 4nSj is returned as significant by the 
test we should, of course, only conclude that ,,H, is not true and that some such hypothesis 
as ,,H, is true. In practice, however, we wish to conclude more specifically that A} + Bj >0 
for the particular j for which the maximum intensity was observed. The total power, i.e. the 
chance of reaching a significant result when ,,H, is true, is therefore the sum of the chances 
of two situations: 

(i) When the observed maximum }n8} does, in fact, come from the set of positive inten- 
sities (k) (i.e. j is in (k)). 

(ii) When the observed maximum }nS} does, in fact, come from the set (h) of true zero 
intensities (i.e. 7 is in (h)). 

The frequency, therefore, of being ‘misled’ by a significant result of the test is given by 
the second part (ii) of the total power. Below we shall show that this frequency is, in fact, 
very small. We shall prove that this chance is smaller than (m—1)/m times the error of the 
first kind, i.e. smaller than a(m—1)/m, if the F,,,, test is carried out at the 100a % level of 
significance. 








198 Tests of significance in harmonic analysis 


Differentiating (13) and integrating back over the range x < £<0o, we note that 1 — f(z) 
can be written as 


1-pay=[" > Pug) TL, pdG) x (eye 
i+r 


Ja rin (k) 


+f" TE pag) x (ret De tcag 
) 


« rin (k 

= P(x)+ P(x) (say). (15) 
We now note that the second term in (15) (P,(x)) represents the chance for the maximum 
}nSj to exceed x and yet to correspond to a period j in the set (h) of m—I zero intensities. 

We now consider the ‘studentized’ form of the test: 
Let ¢,(s) denote, as before, the distribution of a sample standard deviation based on pv 

degrees of freedom; then 
1— P(./F*) = | $,(s) (1 — p(s? F*2)) ds. (16) 
0 


Substituting the expression (15) for 1—#, we have for the power 
1— P(/F*) = | $,(8) P,(2s*F*) ds + | ¢,(8) P,(2s?F'*) ds 
0 0 


= PJ F*)+ PJ P*) (say). (17) 
The second term in (17), P,(./F*), represents the frequency of reaching a misleading con- 
clusion by the test. In order to obtain an upper bound for P,, we use (15) and (11) and 
remember that the x? integral is a monotonically increasing function of x and hence a 
decreasing function of p, and a monotonically decreasing function of its degrees of freedom. 
Further, since from (12) it is obvious that v, > 2 and p, > 1, it follows that in (11) 

Pel) < (1—e-*), 

and therefore in (17) 


. 


P,(./ F*) <| ” $,(8) jm— | - (1 —e-t)"—1 e-4 dé ds 
0 2stF' 
m—lf@ Ar? \0 _m-l 
Sm I, ere ee eee m ™ (18) 


where, according to (6), « is the error of the first kind, i.e. the significance level chosen for the 
‘studentized’ F,,,, test. It will be seen, therefore, that if the independent harmonic inten- 
sities are used in the F,,,, test, the chance of being ‘misled’ by this test in the sense defined 
is negligible. Kendall’s criticisms were, of course, mainly directed against tests involving 


dependent intensities, as in periodogram analysis. We shall discuss this aspect in $6 and 
consider it in more detail in a subsequent paper. 


5. THE DISTRIBUTION OF THE MAXIMUM F RATIO WHEN THE OBSERVED SERIES HAS 
A GENERAL SYSTEMATIC COMPONENT 
We now proceed to examine the F,,,, test under the general hypothesis 
Hy: y,=Y,+e, (t= 0,1,...,n—1), 
where the ¢, are independent random normal variates. 
Assuming for convenience that n be odd, and representing the Y, by their complete, finite 
Fourier expansion, we note that the above assumption is equivalent to 
i(n—1) 
vn-vh}: y= Ag+ LY (A; cosity + B,sin ity) + ¢,. 
i=1 


i= 





a 
tl 


] 
t 
. 
( 
: 
1 





- P(x) 


(15) 
mum 
‘ities. 


on pv 


(16) 


(17) 
con- 
) and 
ice a 
dom. 


(18) 


r the 
nten- 
fined 
lving 
) and 


finite 


E 


H. O. HartLtey 199 


This hypothesis, therefore, differs from ,,H, in that (m — 1) — m further real Fourier terms may 
occur in the representation of the Y,. It is now convenient to express the magnitude of these 
additional terms as a percentage of the variance of the ¢, by introducing the non-centrality 

ratio Hn-1) 
A= jn & (Aj+ Biot. (19) 

i=m+1 
A will in general depend on the choice of m, the maximum period for which a real amplitude 
is suspected and n, the number of observations made. If we assume that the systematic 
series Y, are the ordinates of a smooth function Y(x), then, from standard Fourier theory, it 
is easy to prove that, with m chosen sufficiently large, A is as small as may be required for 
any n> 2m. In practice, therefore, if we suspect Y, to involve terms of any frequency up to 
order m, the Fi,,x. test can only be expected to detect all these, if the number of observations 
n> 2m, i.e. if the interval at which observations are made is well below the smallest suspected 
half wave-length. If we have little information on the smallest half wave-length to be ex- 
pected and decide on too small values of m and n, the F,,, test, which is based on the 
assumption A = 0, will be biased by an amount depending on the value of A. The effect of 
a positive A on the F,,, probability integral is given by replacing F by (1+ A/v) F and » 

by v’ in (7) and (14), where 

v’ = (v+A)?/(v+2A), p= (v+2A)/(v+A). (20) 


The above modification gives the approximate F,,,, distribution under the most general 
assumption of a systematic component in the series. We would, however, stress again that 
the test is inappropriate unless we know that the Y,can be represented as an aggregate of 
a moderate number of Fourier terms so that m can be chosen with a moderate value and yet 
A will be expected to be 0 or small. 


6. HARMONIC ANALYSIS AS PERIODOGRAM ANALYSIS 


In certain problems in which we search for the unknown periods of an aggregate of Fourier 
terms, such a search can be made by a harmonic analysis choosing the total z range over 
which observations were made as the fundamental wave-length and evaluating intensities 
over a doubly limited range m’ <i<m. Often, however, such a wide interval periodogram 
will miss certain periods and intensities must be computed at a finer interval. In this case 
they will no longer be independent, but as their correlations will in general be small the 
distribution of the F,,,, test is still tractable. We hope to show this in a subsequent paper. 


7. ILLUSTRATIVE EXAMPLES FOR THE TEST PROCEDURE 


The examples given below consist in short series and are given as illustrations only. 

(a) The monthly mean temperatures, in °F. for Greenwich during 1939-40, were taken 
from the Smithsonian World Weather Records (Clayton, H. H. & F. L., 1947), and are given 
below to the nearest degree and with 30° F. as origin §: 


Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. 


1939 12 13 13 18 23 29 31 33 29 18 19 8 
1949 l 8 14 18 26 33 31> 32 27 20 15 9 


§ Brunt (1917, p. 210) describes a large-scale harmonic analysis of the Greenwich temperature 
records of earlier years but on different lines. 




















200 Tests of significance in harmonic analysis 


We may attempt a representation of temperature as a harmonic series with a 2-year funda- 
mental period. The annual effect will then occur for period i = 2, and we allow for further 


superimposed Fourier terms for periods i = 1,3,4,5 and 6. Below is given the analysis of 
variance table for such an analysis: 











Sum of squares D.F. Mean square F ratio 
Total 1986-0 23 — _— 
Period i = 1 1-4 2 0:7 0-1 
=$3 1830-2 2 915-1 137-0 
=$ 53-0 2 26-5 4-0 
=4 7-6 2 3-8 0-6 
=6§ 18-2 2 9-2 1-4 
=6 2-2 3 1-1 0-2 
Residual 73-3 ll 6-7 — 























The ‘Residual’ component has been computed from (3) with Fourier coefficients a; and 6; 
given by (2), whilst the ‘Period’ components are given by 12(a? + 63). The maximum F ratio, 
137, is that for the annual period (¢ = 2). If we wish to test significance at the 5 % level and 
use the approximation referred to at the end of § 2, we should compare this F ratio with the 
5/m = §% point of the ordinary F distribution for 2,11 degrees of freedom. The ratio is, 
of course, highly significant. 

We now repeat the test for the residuals of the original series from the annual wave. 
These need not be computed, as their harmonic analysis would automatically reproduce the 
intensities of the original series shown for i = 1, 3, ...,6 in the above table. The maximum of 
these five F ratios, 4-0, is that for the 8-months period (i = 3), and this should be compared 
with the §% point of F. This is 7-2, and the 8-months period is therefore not significant. 
This completes the test, as the search for significant periods must stop as soon as an insigni- 
ficant result is reached. 

We may ask the question, how large a real amplitude of the 8-months period should have 
been for it to have a reasonable chance to be detected by the second application of the F,,,, 
test used above. The power (14) of the test depends on the values of all the five amplitudes 
Aj+ B? (i = 1,3,...,6). We evaluate it for the situation when only the 8-months period 
(¢ = 3) has a real amplitude, measured by the non-centrality ratio 


A = 12(A§ + B§)/o*, 


whilst the amplitudes for periods i = 1, 4, 5 and 6 are zero. These values are tabulated below 

for a few selected values of A. Below these we show the corresponding power when pv = 00, 

i.e. when the test is based on a very long series. It will be seen that A must be of the order 20 

(i.e. the ratio $(A* + B*)/o? should be of the order $) to be detectable with a reasonable chance: 
Power of Fmax. test, m = 5, only one amplitude real 





A 
0 5 10 15 20 
v=11 0-05 0-15 0-44 0-59 0-78 
v=o 0-05 0-29 0-62 0-85 0-95 


The power is, of course, considerably larger if some of the other four amplitudes are real, 
as it will then contain the chance of detecting these other amplitudes instead of the one 





belov 


duri 





‘ther 
sis of 


real, 
one 


H. O. HartLey 201 


corresponding to i = 3. The power when all five non-centrality ratios are equal to A is shown 
below for v = 00. 
Power of Fmax. test, m = 5, all five amplitudes equal and greater than zero 





A 
0 5 10 
v=00 0-05 0-78 0-99 


(b) A similar analysis was carried out for the records of total precipitation at Greenwich 
during the same 24-months period. None of the periods was found to be significant. 


REFERENCES 


Brunt, D. (1917). The Combination of Observations. Cambridge University Press. 

Ciayton, H. H. & F. L. (1947). World Weather Records. The Smithonian Institution, Washington. 

Finney, D. J. (1941). Ann. Eugen., Lond., 11, 136. 

FisHEr, R. A. (1929). Proc. Roy. Soc. A, 125, 54. 

Harttey, H. O. (1938). J. R. Statist. Soc. Suppl. 5, 80. 

Hartey, H. O. (1944). Biometrika, 33, 173. 

KENDALL, M. G. (1946a). The Advanced Theory of Statistics. London: Charles Griffin and Co. 

KENDALL, M. G. (19466). Contributions to the Study of Oscillatory Time Series. Cambridge University 
Press. 

Patnaik, P. B. (1943). Biometrika, 36, 202. 

Scuuster, Sir ArvHuR (1898). Terr. Magn. Atmos. Elect. 3, 13. 

WALKER, Sir GILBERT (1914). Mem. Indian Met. Dep. 21, part 9. 








[ 202 ] 


THE NON-CENTRAL x?- AND F-DISTRIBUTIONS AND 
THEIR APPLICATIONST 


By P. B. PATNAIK, University College, London 


1. INTRODUCTORY 


In the Neyman-Pearson theory of testing statistical hypotheses, the efficiency of a statistical 
test is to be judged by its power of detecting departures from the null hypothesis. Thus 
besides knowing the random sampling distribution of a given statistic 7' under this hypo- 
thesis, say H,, it is also necessary to know the distribution of 7’ under admissible hypotheses 
alternative to H,. Hence the power function of the test is obtained. In the case of the well- 
known tests using x’, tand F’, the evaluation of their power functions involves the use of what 
have been called non-central distributions. For example, if we are applying the ¢-test to 
examine if a sample has come from a normal population with mean 4 = 0 (Hj), we know that 
under A, ¢t has a 5% chance of exceeding the 5 % point of its distribution. But in order to 
compute the power of the test we wish to know the chance that ¢ exceeds this point when uv has 
alternative values, not equal to zero. This chance is given by the non-central t-integral. This 
distribution aas been studied by Fisher (1931), Neyman (1935), Neyman & Tokarska (1936) 
and Johnson & Welch (1939). In a similar way, the non-central x?- and F-distributions arise 
in consideration of the power functions of the x*- and variance-ratio tests. 

The power function may be used either to determine the extent of the departures from H, 
in a given direction, which will be detected as significant (at a prescribed level) with a given 
probability, or it may be used to determine in advance the size of experiment necessary to 
ensure that a worth-while difference will be established as significant, if it exists. But apart 
from its value in this connexion, the study of non-central distributions is of considerable 
interest. The mathematical forms of these distributions of ¢, vy? and F have been long known, 
but their use without extensive tabling has not been easy. The present paper is therefore 
concerned with two lines of investigation: 

(a) The derivation of certain approximations to the probability integrals of (i) non-central 
x?, and (ii) the ratio of non-central x? to an independent central y?, which we have termed non- 
central F. These approximations, depending on tabled functions, permit easy calculation. 

(b) Discussion of the ways in which these distributions may be used in connexion with the 
power functions of statistical tests. 


2. THE NON-CENTRAL x?-DISTRIBUTION 


2-1. Geometrical derivation 


As is well known, the statistic x? is defined as the sum of squares of (say) n independent 
random deviates, £;, all drawn from a normal population with mean, 0, and standard 
deviation, o, viz. 


+ Part of a thesis approved for the degree of Ph.D. of the University of London. 











sh 





P. B. Patnatk 203 


If, however, the mean £; is a; and we write 
a, = §:-4a, 


then we have the non-central x? defined by 
x%= DY (x, +4,;)*/0°. 
i=1 


The probability distribution of x’? has been obtained by Fisher (1928) as a particular case 
of the distribution of the multiple correlation coefficient. A purely analytical proof was given 
by Tang (1938). As x”? is a generalized form of x? it may be of interest to compare its 
geometrical representation with the familiar geometry of x*. We therefore give a direct 
geometrical derivation of the x’*-distribution. 
iwi Without loss of generality we shall assume in what follows that o = 1, so that the proba- 


nonce bility law of x is given by 
well- 


1 
what p(x) = (2m) ye. (1) 
St to n 
‘that Then y2z= Dd £3. 
t=1 


ler to 
w has In the n-dimensional space of the £’s, suppose O is the origin, P the point (§,,...,&,), 


This A the point (a,,...,@,), 2 POA = 06 and M the foot of the perpendicular from P on OA as 
1936) | shown in Fig. 1. Then 


arise OP? =x", OA®= > ai=A, say. 


stical 
Thus 





m H, 

fiven | 
ry to 
upart 
rable 
own, 
efore 





ntral 
non- fr M A 
ion. Fig. i 

hthe | 








From (1), the probability density at P is proportional to 
| 
ee. 
exp | - 5 > (&- a,)| = exp[—}PA?] = exp[— 4(x’2+A— 2x’ /Acos8)]. (2) 
i=1 

If we keep OP and @ fixed, P describes an (m — 1)-dimensional sphere of radius PM = x’ sin@ 
with its surface area proportional to (x’ sin 9)"-*. If x’ is increased to x’ + dx’ and 0 to +8, 
ident then a disk of area y’dy’ d0 moves round this surface and hence covers a volume proportional 
dard | to 

(x’ sin 0)"-* y’ dx’ dé. 
To obtain the distribution of y’ alone, we integrate out 0. Thus 


P(x’) dx’ = Of" etree 10084) (’ sin @)"-2 y' dOdy’, 
0 











204 The non-central y?- and F-distributions and their applications 
which is equivalent to 
hn 
P(x?) dx’? = Ce ket (x’2)t 1 dy’? x | (e-Vax’ C080 4 eVAx’ cos 6) sin”-2 6 dé. (3) 
0 


Expanding the integrand and integrating term by term, we find 
, ad , 14) T[4(~—-1)] (1 ) 1 (Sy 
Dice 2 Hx? +A) (y/2)tn—1 pee 
vs Jes cell aa I'(4n) +(% + n(n +2) 2! : 7 


If zero is substituted for A, this reduces to the ordinary x?-distribution which therefore gives 
us the value of C. 








e-tx* eH © (y’2)inti—1 Ji 


We then have P(x?) = gin iy T(4n +9)27) ji! a: 





2-2. Derivation through a transformation of variates 


Next we will show that it is possible to effect a variate transformation so as to transform y’* 
into a sum of (n—1) central squares and a single non-central square and then derive its 
distribution. Make the following orthogonal transformation: 


Yr = Cy S++ Cy bgt --- +CinEn, 


Stibaine boeehdtipiybchacealepeenaniel (5) 
Yn, = Cn $i t+ Cnede+ sca +n,6,- 
Then y= 5 gy} 
Generally, if E(Y;) = Cj Ay + CjgQgt ...+Cjn A, = 5b; (j = 1 ton), 
. r (6) 
we have >a == 63, 
1 





Now we can make 
b, = b,=...=b,.,=0 and 5b, =./(Za?) =A. 
Thus y’? = > £? is distributed as'S 93 +y?, the sum of the squares of (n — 1) normal variates 
with mean uae and the square of i single normal variate with mean ,/A, the s.D.’s being unity. 
Writing x2 = w, S yj=u and yi =», 


we see that wu has a x?-distvibution with (nm — 1) degrees of freedom, that is, 


p(u) = 


ete yk(n-3) 
2K—DTLE(m— 1)’ 





and that v follows the law 





—HV v—v Ay —(—V v—v Ay 
ia is vA (vA)? 
= rT} ve ———— e —H +A) gy (145+ a1 i +), 


Hence, replacing v by (w—w) in the joint probability law p(u, v), we have 


no sitters)” 2) #2) ++} 











wl 


eq 


Ww 





(3) 


zives 


iates 


nity. 





P. B. Patnark 205 
Whence integrating with respect to u from 0 to w, we obtain 


et e—typin—2) nm—11\ wd (n-13 
pee) = sary T@ mA >a) a1 (“a >a) +} 


: oe etx e—4A(y’2)in—1 1 /x’2A 1 x’2A\2 ) 
that is, P(x”) =~ 3in F(dn) {143 ( > )+aera-ni(2s-) ilk (7) 














which is seen to be the same as (4). 
In this distribution of y’*, nm may be called the number of degrees of freedom and A, which is 


n 
equal to the sum of the squares }) a?, the non-central parameter. 
1 


2-3. Conditional distribution of x'* under linear constraints 


Suppose the £’s are subject to k (<7) linear constraints. These can be transformed into an 
orthogonal set represented, say, by the equations 


= 48: =p; (j= 1,..-,k), (8) 
where = cj, = 1, = CQ; = 9 (72). 


We make an orthogonal transformation of variates defined by the equations (5), so that 
x£? transforms to Xy# and the k constraints of (8) become simply y, = pj, ---, ¥x = Px- To find 
the distribution of Ly} subject to these conditions, we first see that, in virtue of the relations 
in (6), the joint probability law of the £’s 


n 
transforms into P(Y1 «++» Yn) = Cexp | - ; = (y;- b,)2| : 
1 


When y;,,...,y, take respectively the constant values p,,...,e,;, we have the conditional 
probability law 


LS 
P(Yarr> +++» Yn | Pas +++» Pe) = Clexp -3 >» (yb) : (9) 
k+1 
It can be shown from (9), as in § 2-1, that the sum of the non-central squares (y}., , + ... + 3) 
is distributed as a y’* with (x —k) degrees of freedom and parameter 
A = bf,,4+...+53. 


From (6) weseethat yj, +... +y% = L€?—(pi+... + pp) 


and bia t+... +032 = Sa? — (b3+ ... +53) 
n>: k n 2 
= ES at- & (¥ ac) - (10) 
i=1 j=1 \i=1 


n k 
Hence (= &-> 3) is distributed as a x’? with (n —k) degrees of freedom and parameter A, 
1 


1 


given by the expression in (10). 








206 The non-central x?- and F-distributions and their applications 


In particular, if there is only a single constraint on the £’s, given by 
resi=P, Le=l, (11) 
1 1 
then (= £2- p*) follows a x"?-distribution with (n — 1) degrees of freedom and 
1 


A= Eat- (Sac) (12) 


3. APPROXIMATIONS TO THE X’2-DISTRIBUTION 
3-1. The x?-approximation 
Fisher (1928) has shown that the distribution function of x’? given by (4) can be expressed in 
terms of a Bessel function with imaginary argument. When n, the number of degrees of 


freedom, is odd, this can be reduced to elementary functions. When n is even, we see that the 
probability integral 
| _(x"*) dx”? 
xX 2 


can be expressed as a double Poisson sum. However, in both cases, the labour of calculating 
the probability integral is considerable. 

In his paper, Fisher has given a table of the upper 5% significance points of the x"*- 
distribution for n = 1 to 7 and ,/A = 0 (0-2) 5-0. Garwoodt has an unpublished table of the 
lower 5 % points for the same range of values of n and A. No tables of the probability integral 
are available. It may therefore be useful to have an easy method of determining the prob- 
ability integral and percentage points sufficiently accurately for any given values. For this 
purpose we shall consider several approximations to the distribution of y’*. 

The characteristic function of this distribution is easily seen to be 


g(t) = exp |“ =F alfa — 2it)i*. 


Hence we have the following cumulants: 


kK, = n+A, Ky = 2(n+2A), (13) 


Ky = 8(n+3A), Ky, = 48(n+ 4A), 
the general rth cumulant being 
kK, = 2°-\(r—1)!(n+rA). 
In the £;, 8, diagram, it was found that the point computed from the above x’s moved 


close to and above the Type III line, and this suggested that we might fit a Type III distribu- 
tion from the first two moments. This is given by 








oe e ty yl 
fly) = 2° T Qn)’ (14) 
where y = x/p, 
_ n+2A A (n+A)? 2 
a. ies ”= (n+ 2A) ee (15) 


This means that we are representing the distribution of (x’*/p) by that of x? with v degrees of 
freedom, v being in general a fraction. 


+ I em grateful to Dr F. Garwood for kindly making his table available to me for reference. 


~ 








(11) 


(12) 


(13) 


ved 
bu- 


(14) 


15) 


s of 


~ 


eg 





P. B. PaATNAIK 207 


In what follows we she!l write z for y’?, p(x) for the true distribution of x’? with n degrees of 
freedom and parameter A and f(x) for the approximation to p(x) obtained by assuming that 
zip = y is distributed as yx? with v degrees of freedom. 

Then the probability integral [° { 


2) dx = ” py) dy 


0 
v 
is approximately given by I, f(y) dy. 


This integral can be expressed in the notation of the tables of the Incomplete T'-function 
(K. Pearson, 1922) as I(u,p), where 
ae: EX x oo 
J(2v)  /[2(n + 2A)]’ 2 2(n+2A) ° 
and could be evaluated by interpolation in these tables. For interpolation u-wise the second 
differences with Everett interpolation coefficients may be used, while linear interpolation 
p-wise seems adequate. 

The approximations to the probability integral so obtained for certain values of n, A and x 
are shown in Table 1 for comparison with the exact values. In some of these cases z is the 
upper 5°%, point (Fisher) or the !ower 5% point (Garwood), so that the exact values are 
0-95 or 0-05. The others are directly computed. For many purposes, especially in connexion 
with power functions, the degree of accuracy given by this method may be considered quite 
adequate. 





(16) 


=z 
Table 1. Showing exact and approximate values of the x*-integrals, { p(x) dx 
0 












































n A x Approx. Exact 
4 4 1-765 0-0399 0-0500 

4 10-000 0-7191 0-7118 

4 17-309 09492 0-9500 

4 24-000 0-9913 0-9925 

10 10-000 0-3178 0-3148 

7 1 4-000 0:1621 0-1628 

1 16-004 0-9499 0-9500 

16 10-257 0-0430 0-0500 

16 24-000 0-5947 0-5898 

16 38-970 0-9482 0-9500 

12 6 24-000 0-8187 0-8174 
18 24-000 0-2936 0-2901 

16 8 30-000 0-7895 0-7880 
8 40-000 09626 0-9632 

32 30-000 0-0590 0-0609 

32 60-000 0-8329 0-8316 

24 24 36-000 "01556 0-1567 
24 48-000 05333 05296 

24 72-000 0-9656 0-9667 

















208 The non-central y?- and F-distributions and their applications 


To find the percentage points of the x’? distribution, we first interpolate in the appropriate _ 
percentage point tables of the x? (e.g. Thompson, 1941) for v degrees of freedom and then 


multiply the interpolate by p. Four-point Lagrangian interpolation formulae may be used. 
The approximate upper and lower 5 % points obtained by this method for certain values of 
n and A are given in Table 2, along with the exact values. Clearly the accuracy is not as good 
for the lower points as for the upper ones. Although the comparisons have had to be confined 
only to small values of n, since Fisher and Garwood have only given exact percentage points 
up to n = 7, from the closeness of the probability integral approximation (Table 1) we could 
still expect that the approximation to the percentage points would be fairly close for higher n. 


These approximations based on the x? fit will be referred in subsequent sections as the 
first approximation. 


Table 2. Showing exact and approximate values of the 
percentage points of the x'*-distribution 























Upper 5% point Lower 5% point 
n a 
Approx. Exact Approx. Exact 
2 1 8-63 8-64 0-20 0-17 
4 14-72 14-64 0-94 0-65 
16 33-35 33-06 6-89 6-32 
25 45-66 45-31 12-68 12-08 
4 1 11-72 11-71 0-93 0-91 
4 17-38 17-31 1-95 1:77 
16 35-69 35-43 8-36 7:88 
25 47-94 47-61 14-26 13-73 
7 1 16-01 16-00 2-51 2-49 
4 21-28 21-23 3-78 3-66 
16 39-16 38-97 10-64 10-26 
25 51-34 51-06 16-68 16-23 























3-2. The normal approximation 


It is known that, for n > 30, Fisher’s approximation, that ,/(2x?) is distributed as a normal 
variate N(,/(2n—1), 1),+ will give fairly close values to the probability integral and per- 
centage points of the x*-distribution. It can be shown that a similar normal approximation is 
available for the x’?-distribution for large values of n or A. 

First we shall show that x’ approaches normality with greater rapidity than y’*. 

If x is written for y’*, and x, is mean x, we have by Taylor’s theorem 


xt = ah + d(x — 2x9) aot — f(x — xy)? 29 § + (x —24)® aq! +..., 
at = a$+ 3(x—2x9) a} + F(x —2)* xp t— (x —2,)8 apt .... 


+ Here and below the notation N(a, 6) is used tc indicate that a variable is normally distributed with 
mean a and standard deviation b. 


eee 








fro 


a 





priate | 


then 
used, 
ues of 
good 
fined 
ints 
could 
her n, 
is the 


with 





P. B. Patnaik 209 
By taking expectations on both sides and substituting from (13) the moments of x = x’*, we 
get “; and jz of x’. Also 
A(X") = HX"), a(x’) = HX). 
Hence we derive the following moments: 


__n+2a i mt3a _ 15(n+ 2A? 
4(n+A)?' 2(n+A)! 32 (n+Aye 





iy = (n+A)t 
Hg = (n+A), 
iy = (n +A)? 4+ —-——— -- — + 
Mg = 2(n+ 2A)+(n+A)?, 


from which we obtain 


Pa eT eee 7 Se 
4(n+A)! ' 2 2(n+A) 


— n+3A _3(n+ 2a? | 
34a)! 4(mtaye” 





Mg = 3+ O[(n+A)-]. 








=<. n*+ 4nd a ee -2 
Hence V1 a a Jan +A) (n+ 2Ayht a 3 = O[(n+A)-*]. 
Comparing these with the corresponding coefficients of the y’*-distribution, viz. 
_ /8(n + 3A) _ 12(n+4A) 
nascar Uf Gee 


we see that y’ approaches normality faster than y’*. 
From the above it follows that ./(2y’*) has mean ./{2(m+A)—(n+2A)/(n+A)} to order 
(n+A)-? and variance (n + 2A)/(n +A) to order (n+ A)-*. We can therefore regard 


Para | 


[fe +a?) 
\n+2a j 
and variance unity. 


‘Lhis result may also be derived by taking the x?-approximation to the y’*-distribution and 
then using the known result that for large v, ,/(2x?) is distributed as N[,/(2v—1), 1]. For, 
substituting y’?/p for x? and the expressions in (15) for p and v, we reach the same normal 
approximation. 

Since v > from (15), it can be seen that the normal approximation to x’ with n degrees of 
freedom will be better than the normal approximation to x with the same degrees of freedom. 
Thus, for example, if n = 25, we have 


as distributed normally with mean 





A= 0 10 20 30 40 
v=25 27:22 31:15 35:59 40-24. 


Hence for sufficiently large values of n and A, the probability integral and percentage points 
may be obtained from the normal tables. Table 3 gives a comparison of some values of the 
probability integral, thus calculated, with the exact values. 


Biometrika 36 14 








210 The non-central y?- and F-distributions and their applications 


Table 3. Values of the x'*-integral on the normal approximation 





























; 
From From . 
n A v x | x atetuel Exact 
16 32 28-8 30 0-0590 0-0638 0-0609 
16 32 28-8 60 0-8329 0-8320 0-8316 
24 24 32-0 36 0-1556 0-1515 0-1567 
24 24 32-0 72 0-9656 -9686 0-9667 
| 





3-3. Closer approximations to the y'*-distribution 


The probability function of x’? can be represented in the form of a series with the fitted 
probability function of (px*) as the leading term and, from these mathematical expansions, 


closer approximations to the probability integral and percentage points may be obtained. 
Two methods will be briefly considered. 


First method 


The cumulants of the distribution f(z), as defined on p. 207 above, are seen to be 


ki =n+A, kz = 2(n+ 2A), 
ae HEME 4 48(n4 205 a1) 
oo ata’ eaap 
2 )\r-1 
the rth cumulant being KF = 2-1 (r—- Teac 


Comparing these with the corresponding cumulants of p(x) in (13), we find «* > «, for r> 2. 
Let us write 


Ky—K3 = Cy, Ky—K © Uy... (18) 
Then the corresponding differences of cumulants © »(y) and fy) as defined on p. 207, will be 
Cs/p*, c,/p',.... 
By the application of the Edgeworth operator to f(y) we have 
Beg a | 
ép8dys + 24ptdys* (SY) 


{_ 34 4 ae <3) 6 sata) 8 \ 
[+ 6p? +e “(FoF 6p D*+ 248 D aes See: fly). 
Hence the probability integral [” p(y) dy is given by 


[sma [{-Sro+saro+..| 


p(y) = exp a 


+3:((sh) w+ ( poy) +...}+. =i (19) 


Since the higher derivatives of f(y) become smaller in value for a given y, we retain only the 
first term in the square brackets of (19) and get a second approximation to the probability 


integral in the form 
y dy-2 d® py d 
3 fy) v-aas|, f(y) dy, 





TI 


Si 


fitted 
isions, 
ained. 


(17) 


o> Ss. 


(18) 
will be 


P. B. Patnark 211 


Cs ad*I 
6 Vayda ws 
When using the expression (20) for the evaluation of the integral, the computation of the 
first term J(u, p) will, in general, require interpolation in the tables of the Incomplete T- 
function. We shall now show that by a suitable modification of the Everett interpolation 
formula, the second term in (20) can be accounted for and the whole expression computed in 
one calculation. 


If u,, wu, are the tabulated values between which u lies aud Aj, Aj the tabulated second 
differences, we have as an approximation 


o3~ (As Aj) 108, 


which can be written as I(u, p)— 


the interval for u being 0-1 in the tables. Suppose q is the fraction (wu —wu,)/(u,—,), Ej, Ey 


the second-order Everett interpolation coefficients corresponding to qg and k = os si “Ke —— vi 
Then (20) becomes 
I(u, p) (1 —q) + L(g, p) q+ Ay( Ej + &) + A3( #3 — &). (21) 


If p is not a tabled value but lies between p, and p,, then we evaluate the above expression 
for p, and for p, and then interpolate linearly for p. 


Second method 


It is well known that by using the Edgeworth form of the Gram-Charlier Type A series, 
a frequency function can be normalized if it approaches normality asymptotically and if its 
cumulants are in increasing order of some quantity, n-. 


Goldberg & Levine (1946) have shown that by the method of normalization the percentage 
points of the y*-distribution could be obtained to a fairly good degree of accuracy. A similar 
method might be applied usefully to the x’*-distribution. However, a modified form of 
expansion with the fitted x?-function as the first term will be found more suitable. 

Let us standardize the variate x (written for y’*) by introducing 

© J(2n + 4A)* 
Then, using the same notation as before, the cumulants of the distribution p(é) are 
2 1, ek, ald... 
Since f(z) has the same mean and standard deviation as p(x), we get for the cumulants of f(£) 
©, 1, athd, cfhd..... 


These cumulants, from the third onwards, are of orders — }, — 1, —3, ... in both n and A. Now 
let 


a = a(£) = eH"/,/(27), 


and let £5, £4, ... be the Hermite polynomials of orders 3, 4, .... Then we have, arranging the 
terms in order of magnitude of n (Kendall, I, 1945, § 6-32), 


pie) = a(6) +9 Stag, +0( 36,4496) +... (22) 


There is a similar expansion for f(£) with x* in place of x, (r > 2). 


14-2 








212 The non-central y*- and F-distributions and their applications 


Now we subtract formally this second series from the first, term by term, and transfer f(é) 
to the right-hand side. We then obtain 


le le le 
BY oe aan Asiticn | 33 2 
ple) = ft6i +246) | 546+ (sche ayes) + |, (23) 
where c,c, have the same meanings as in (18) and c,,, is written for (K,«K,K,— Kt KR A?). 
We know that the infinite series in (23) is not uniformly convergent. We can still integrate 
it formally term by term and make use of the first few terms to get a better approximation 


than that given by the integral of f(€) uione. Thus retaining terms up to O(n-*), we derive an 
approximation to the probability integral 


[; peas = | “ple ae 


in the form 


lo, 1 1 
[sevae+ aie] -556— (3,948 + 3586s) - (aos) ra hee tif) | 


The first term in (24) is our first approximation of § 3-1 and the rest give a correction to it 
which is seen to result in a considerable improvement (see Table 4). For evaluating this 
expression, the values of the Hermite polynomials may be taken from Jorgensen’s tables 
(1916) if € is an argument tabled there; otherwise they have to be directly calculated. «(£) 
may be found (without need for interpolation): from Tables of the Probability Functions, 
Vol. 2 (Federal Works Agency, New York, 1942). 

The coefficients in (24) involve only differences of the cumulants and so are smaller than the 
corresponding coefficients in (22). Thus a closer approximation is likely to result from (24) 
than from the same order of terms in (22). 

For the percentage points, we employ the inversion of the Gram-Charlier series obtained 
by Cornish & Fisher (1937). If x, x’ and & are respectively the percentage points of the 
distributions p(x), f(x) and a(£), then for a giver probability level, we have 

aerany ~ Eth @— N+ [gg @—28)- zgedkae— 56)] + 
a’—(n+A), 


*: 
Jn + 4A) has a similar expansion with x} in place of «,(r > 2). By differencing as before we 


obtain an expression for x in terms of x’ and £. Retaining terms up to O(n), we find 


36 Ke j 


-nae- 5£2 + 2) + . “HF (1284 sasin)l |. ¢ 


= 2+ y(2m4-4a)] 559 1) 4 [554 —28)— 558 oe 56) + scien 
Kg 


324 Kk 


In this, z’ is our first approximation, and the correction improves it considerably even at 
the lower end of the distribution. The values of the expressions in £ in (25) are directly avail- 
able for several probability levels from the table in Cornish & Fisher’s paper. 

The approximate values of the probability integral of the x’?-distribution obtained by 
these methods in a few cases are giver: in Table 4. Table 5 shows the approximate upper and 
lower 5 % points evaluated by method IT. 

Comparing the two methods for the probability integral, the second one, employing 
terms of the Gram-Charlier series up to O(n-!), gives greater accuracy and is to be preferred, 











r f(é) 


(23) 


grate 
ation 
ve an 


re we 


a 


P. B. Patnark 213 


although from the point of view of labour and time involved, the first method is simpler and 
easier to apply. With respect to the percentage points, the method using the Cornish-Fisher 
inversion appears to be quite good, particularly at the upper points, but it does involve 
a certain amount of labour. 


Table 4. Closer approximations to the x’*-integral 











2nd approx. method 
Ist ~ 
n A x approx. Exact 
I Il 
4 4 10-00 0-7191 0-7209 0-7119 0-7118 
4 4 24-00 0-9913 0-9917 0-9913 0-9925 
7 16 24-00 0-5947 0-5938 0-5869 0-5898 
7 16 38-97 0-9482 0-9504 0-9502 0-9500 
16 8 20-00 0-3380 0-3345 0-3368 0-3369 
16 8 40-00 0-9626 0-9632 0-9631 0-9632 





























Table 5. Closer approximation to the x'*-percentage points, using method I 












































Upper 5% point Lower 5% point 

n A 
lst 2nd . Ist 2nd ae 
approx. approx. Exact approx. approx. Bunct 
2 4 14-72 14-67 14-64 0-945 0-574 0-646 
2 16 33°35 33-06 33-06 6-891 6-526 6-322 
4 4 17-38 17-33 17-31 1-954 1-731 1-765 
4 16 35-69 35-42 35-43 8-363 8-017 7-884 
7 4 21-28 21-27 21-23 3-789 3-750 3-664 
7 16 39-16 38-97 38°97 10-637 10-267 ‘ 10-257 








4. APPLICATIONS OF THE X°-DISTRIBUTION 


4-1. The power function of the x*-test 
There are several possible applicatious of the non-central x*-distribution in statistics. We 
shall consider only a few of them. We will show here how this distribution arises in the study 
of power functions of the x?-tests and how the approximations of §3 are useful in this con- 
nexion. 

Suppose €,, &5,...,§, are m independent observations in a sample. If we make the null 
hypothesis H,, that the £; have been drawn from a normal population with mean zero and 
S.D. unity, then if H, is true, the statistic x? = LE? will exceed x2, the a-significance point of 
the x?-distribution, based on n degrees of freedom, in a proportion a of the cases. 

The power of the x?-test is given by the probability that D£* exceeds x2 under some alter- 
native hypothesis. If as an alternative to H,, we suppose that the €; have been drawn from 
normal populations having unit s.p. but different means a;, then Xé? will follow the non- 








- 


214 The non-central x?- and F-distributions and their applications 


central y*-distribution with n degrees of freedom and parameter A = Xa?. Denoting this by 
P,»(x’? | A), the power function is given by 


| “aPalx”?|A)dy"*= Alm, 2, (26) 


Thus the power is a function of the single parameter A and we may write the null hypothesis 
as H,(A = 0) and an alternative as H,(A), where H, is a composite hypothesis including the 
family of alternatives for which La? = A. 

It was shown in §3-1 that the y’*-distribution is fairly well approximated by a Type III 
distribution fitted from its first two moments. The power function # could therefore be 
evaluated quickly and fairly accurately by the method of the first approximation. When 
greater accuracy is needed, one of the other methods described in § 3-3 may be used. 

We give here a table (Table 6) of values of the power of che y?-test applied at the significance 
level a = 0-05, obtained by the second method of §3-3. The accuracy of these values in 
different parts of the table can be judged from the closeness between the approximate and 
exact values of the probability integral shown in Tables 1 and 4. In some of the cases tabled 
there, the limit x was chosen near to the 5% point of the corresponding y?, so as to give a 


value of z 
1 -{ Pp, (x | A) dx 
0 


in the neighbourhood of the power /. It is believed that, in general, there is three-figure 
accuracy in Table 6. 


Table 6. The power function of the x-test using a 5°%, significance level ; 
values of B(n,A,a), where a = 0-05 



































‘ A 2 4 6 | 8 10 12 14 16 18 20 
2 6 224 | 0-416 | 0-585 | 0-719 | 0-819 | 0-885 | 0-929 | 0-956 | 0-973 | 0-983 
3 0-195 | 0-357 | 0-518 | 0-655 | 0-762 | 0-841 0-897 | 0-935 | 9-958 | 0-974 
J 0-171 0-320 | 0-470 | 0-605 | 0-719 | 0-803 | 0-867 0-913 | 0-943 | 0-963 
5 0-157 | 0-292 | 0-432 | 0-565 | 0-678 | 0-769 | 0-839 | 0-891 0-927 0-952 
6 0-146 | 0-270 | 0-404 | 0-531 0-644 | 0-738 | 0-812 | 0-870 | 0-911 0 940 
7 0-138 | 0-251 0-378 | 0-502 | 0-614 | 0-710 | 0-788 | 0-849 | 0-895 | 0-928 
8 0-131 0-238 | 0-357 | 0-477 | 0-588 | 0-685 | 0-768 0-830 | 0-879 | 0-916 
9 0-125 | 0-225 | 0-339 | 0-455 | 0-564 | 0-661 0-744 0-811 0-863 | 0-903 

10 | 0-121 0-215 | 0-323 | 0-435 | 0-542 | 0-640 | 0-724 | 0-793 | 0-848 | 0-891 
| 

12 0-113 | 0-198 | 0-297 | 0-402 | 0-505 | 0-601 0-686 | 759 | 0-818 | 0-866 

14 0-108 | 0-185 | 0-276 | 0-374 | 0-473 | 0-567 0-653 | 0-728 | 0-791 0-842 

16 0-103 | 0-174 | 0-259 | 0-351 0-446 | 0-538 | 0-623 | 0-699 | 0-764 | 0-819 

18 0-099 | 0-165 | 0-244 | 0-332 | 0-422 | 0-512 | 0-596 0-673 | 0-740 | 0-796 

20 0-096 | 0-158 | 0-232 | 0-315 | 0-402 | 0-489 | 0-572 | 0-648 | 0-716 | U-775 














When x or A is so large that v = n + A?/(n + 2A) is over 30, we may use the normal approxi- 
mation of §3-2 for obtaining the power function more quickly than by the method of the 
x?-approximation. 

The above table can be used in a variety of ways: (a) For given A and n, we may ask what is 
the chance of establishing significance at the 5 % level? (b) For given n, we may ask how large 
A must be to have, say, a 90 % chance (f = 0-90) of establishing significance at the 5 % level 





P. B. Patnaik 215 


by | whenareal difference in thea, exists? (c) For given A, we may ask how many observations are 
| necessary to have a chance f of establishing significance? 
An alternative graphical approach to the inverse problems (6) and (c) is indicated in § 7-3, 
26) p. 228 below. 
4-2. Application to the x?-test for the goodness of fit 








the The x?-test for goodness of fit is concerned with the comparison of observed frequencies 
' with those expected under a given hypothesis. The latter may be the theoretical frequencies 
of a continuous distribution or may be obtained by taking integrals of a continuous frequency 
Ill ae : : 
i distribution over a set of class intervals. Denote the observed frequencies by n; and the 
ae expected frequencies by N7; (i = 1, 2, ...,&), where k is the number of groups and N the total 
number of observations in the sample. Then 
k k 
on En, = ¥ Na, =N. (27) 
; in i=1 i=1 
a As is well known, the distribution of 
- k(n, —Na,)? 
- vn itn 28 
ea ¢ x Na, (28) 
when the N7; are the true population expectations, may be related as an approximation to 
that of the sum of squares of normal variables. To link up also with the non-central theory 
ure discussed in §§ 2-1—2-3, the following approach may be adopted, although it must be realized 
that the conclusions reached are not exact. As in all problems concerning ¢°, it is generally 
_ only possible to assess the degree of error involved, in samples of finite size, by specific 
| numerical comparisons. 

As shown originally by K. Pearson (1900, 1916), the variances and co-variances of the k 

frequencies n,, restricted by the condition (27), are precisely those holding in the section 
X,+Xot+... +X, =0 (29) 
of the k-dimensioned normal probability distribution whose probability density at 
(X,, Xq, ..., X_) i 
is p(X,, Xq, ..., X,) = constant x exp [-12 7 |- (30) 
i V7; 

Thus, provided that the expectations Nz; are large enough to prevent serious inaccuracy 
from discontinuity effects or boundary limitations, relationships between the n; may be 
treated as relationships, within the prime (29), between normal variables X; which in the 
k-dimensioned space are distributed independently with zero means and variances N7;. 
With these limitations, we may write 

=.= = “yee ees I 31 
Jive) J(vm) ' ie 

The distribution of the ¢? defined in (28) can then be derived from the results given in § 2-3. 
xi- | The condition &n; = N may be written 
the n;—Nn; 

| Bm Na) ° we 
tis corresponding to ¥c;x; = p = 0, where Sc? = 1. Hence ¢? will be approximately distributed 
rge i i 
vel as x” with k— 1 degrees of freedom. 








216 The non-central x*- and F-distributions and their applications 


Having in mind the question of the power of the test, we may next ask what will be the 
distribution of ¢? if the frequencies Nz; inserted into the expression (28) are not the true 
expectations? Suppose that Np, are the true expectations; both Dp; and >7; will be unity. 


In the notation of §2 we now have 








_ %—Nn; _ n,—Np, —_ N(pi-7) 
“< V(Np,)’ * V(Np;) ’ ™" V(Np;) ’ aad, 
: (n;—N7;) _ Le 
while 2 vPi Jp) > Begs = 0. (34) 


It follows that approximately 


o? = asi = yo Np (35) 


will be distributed as a non-central x? with k— 1 degrees of freedom and 


—_— ° 2 
Vv =S (a3) =N a (36) 
The sum of squares we need is the ¢? of (28), not the ¢’? of (35). By introducing a further 
approximation we may, however, conclude that ¢? = Lm, — Nn,)?/N7; is distributed as 


non-central y? with k— 1 degrees of freedom, and 


».—77.)2 
A=N pea (37)t 
The approximation involved should not be serious if the differences 6; = Na,;— Np; are 
small compared to N7;; for 





12 (n; — N7,)* m (n,;—N7;)* vs es 
ies. Np; = Nn; (1-3 
m (n,—N7,)? (n,; — Nz)? 
Ot Wag TE aa t” 


Since the multipliers 4; in the second term may be positive or negative and Xd; = 0, this term 
will generally be small; the further terms, containing successive powers of 6,/(N7,), will also 
be of diminishing importance. 

This result makes it possible to determine the nower of the goodness of fit test of any 
simple (completely specified) hypothesis H, (specifying probabilities 7;) with respect to a 
simple alternative hypothesis H, (specifying probabilities p;). Hence, for any given class of 
alternatives H, we can determine the power function. In so far as the 5 % significance level 
is used, the power may be determined from Table 6, p. 214, using the A of equation (37) and 


degrees of freedom k— 1. Otherwise, we can use the x?-approximation to the y’?-distribution 
developed in § 3-1. Thus the power is 





+ a Pr_alX’? | A) dx” =|, ay P(x?) dx”, (38) 
k—1+2A _ (k-1+A?? a soe 
nn I= = —— _ > : . 
where p= oe tee yin a v(z ‘ 1) (39) 


+ In making the approximation, we have associated the A of (37) with the distribution of ¢* rather 
than the A’ of (3C), but this step perhaps needs fuller justification. 











the 


are 


rm 
lso 


ny 
>a 
of 
vel 
nd 
ion 


38) 


39) 


her 





P. B. Patnaik 217 


For comparison of this approximate distribution with the exact one, we proceed now to 
find the exact moments of ¢?. It is known (e.g. Haldane, 1937) that under H, the expectations 
of the powers of the observed frequency n; are 








E&(n,;) = Np,, 
E(ni) = Nepi+ Np; 
E(n}) = N,pi+ 6N,pi+ 7N,p} + Np,, r (40) 
&(n2n§) = Np} pj + Ne(p3p;+ Pjp;:) + NepiP;, 
etc., 
where N, = N!/(N—-r)!. 


Writing ¢? in (28) in the form las y Uni 7,)—N, 
4 t t 
5 Eli) 


we have &(¢?) = —N 
1; 
_ Li Np? Np; 
v4 va 1; ies 1; “Fi - of 
Hence a = (N- eee ee i (41) 
: 2)2 of 72 
Again, &[(9*)*] = 628 tgs 2 Bae —s ana |: 


from which on substitution and simplication we lian 
My = N-4(N — 1) (6—4N) [2(p3/a,)]? + 4(N — 1) (N — 2) E(pi/7}) 
— 4(N — 1) X (p/m) X(p;/7;) + 6(N — 1) 2 (pi /77) 
— [2(p,/7,)? + 2(p,/79)}- (42) 

In a similar way the third moment has also been obtained but the expression is so long and 
so difficult to evaluate numerically that it may not be of much value for comparison purposes. 

When 7; = 7; the above expressions reduce to those derived by Haldane (1937) for the 
exact moments of the distribution of ¢? under the null hypothesis. 

The approximation to the distribution of ¢* obtained, using the simplification of § 3-1, 
will have the following first two moments: 

wy = vtA=k-14+A=k-14+N[X(p3/m,)— 0, 
My = 2v+2A) = Wk—1)+4A = 2A(k—1)+4N[Z(p3/m,)— 1), 
using the expression for A in (37). 

A comparison of these approximate moments with the exact ones, (41) and (42), appears 
to be only possible numerically. Some comparisons have been made, including a check-up 
on the whole distribution by a random sampling experiment. In the cases taken, the 
approximation appeared satisfactory for practical purposes but some further investigation 
is in hand. The results will be published in a subsequent paper. 


(43) 


4:3. Uses of the power function of the x? goodness of fit test 
We have seen in § 4-2 that, to the approximation involved, the power of the x*-test for H, 
with regard to an alternative H, is a function of k—1, A, « and can be written A(k—1,A, a), 
where k is the number of groups, @ the significance level at which the test is applied and 
Ae M( Bi ) = NA(H,, H,). 


i=17 








218 The non-central y?- and F-distributions and their applications 


This shows that A is a function of 7; and p,, and can be regarded as a measure of ‘discrepancy’ 
between the two distribution functions specified by H, and H,. 

The power function can be used to answer several questions connected with the test of 
goodness of fit: (2) For given sample size N and number of groups k, we may ask what is the 
chance of establishing the inadequacy of the hypothesis H,, using a given significance level? 
(6) For given k, we may ask how many observations are necessary to give a chance of, say, 
90 % of establishing significance at the 5% level? (c) For given k and N, we may ask how 
large a departure of H, from H, (measured by A(H), H,)) will be detected with a given chance? 

We shall illustrate these applications by an example from genetics. Consider the intercross 

AB AB 

‘ab * ab’ 
where A and B are two independent factors, the recessive genes of which are represented by 
a and b. The offspring are of the four types [AB], [Ab], [aB], [ab] with frequencies in the 
proportions 9, 3, 3, 1. We test whether the experiment is to confirm this theory or to reject it 
in favour of a definite alternative giving frequencies proportional to 9, 3, 3r, r (r being less 
than 1). This happens when the two classes of offspring containing the two recessive genes 


(a, a) are less viable than those containing only one dominant gene, so that only a fraction of 
the offspring survive. 


Here, the expected frequencies are 
7, :9/16, 3/16, 3/16, 1/16. . 
p;4:9/4(3+r), 3/4(3-+7), 37/4(3 +1), 7/4(3 +7). 





_ vy (43+1°) 
Hence A= N( (san? 1), (44) 
where N is the number of offspring studied. Then 
y 4(3+r°) 
A=A(H,, H,) = an (45) 


Let us now consider the three situations where the power-function idea could be applied. 

(a) Suppose we have 100 observations. Using the y*-test at the 5% level to test the null 
hypothesis (r= 1), the chance of establishing differential viability when r = } is obtained 
by evaluating A from (37) and then entering Table 6 (p. 214) with this A and n = k—1 = 3. 
Here A = 300/49 and so the power # = 0-52. 

(6) Suppose we want a 90% chance of detecting that r = }, using the 5% significance 
level. We find from Table 6 that A = 14-1 and hence, putting r = } in (45), obtain A = 3/49. 
Then from (44) we find that we shall need a sample of N = 230. 

(c) Again, if N = 100, a = 0-05, we may ask how small r must be to give a 50 : 50 chance for 
establishing significance? We find A as before and solve (44) for r. Thus taking # = 0-50, then 


A = 5-8 andr = 0-51. 
4-4. A closer approximation to the power function of the x goodness of fit test 
In § 4-2, when deriving the x’*-approximation to the distribution of 
(n,— Na)? 
. = FS 
rhe Nn, ’ 


we made the assumption that 7, and p,, the proportions of the expected frequencies under the 
hypotheses H, and H, do not differ very much, so that we could regard (n;— Np,)/,/(N7,) as 


t 





— 











whe 
mon 





44) 


he 





— 





P. B. Patnark 219 


a normal deviate with zero mean and unit variance. We will now consider the distribution of 
¢* without making such an assumption and use it for obtaining a better approximation to the 
power function. 

We can write ¢? in the form 

re Pe (Me SE) 

m;\ J(Np,) V(Np,) ]’ 

the summation being from i = 1 to k. Now, under H, the quantities (n;— Np;)/,/(Np,) are 

distributed approximately normally, as N(0,1), subject to the constraint Xn; = N. Hence 

¢? in (46) can be regarded as the weighted sum of k normal deviates having different expecta- 
tions and satisfying the condition In, = N. 

We have obtained in the Appendix (pp. 231-2 below) the characteristic function of the 
distribution of such a statistic, viz. Xv,(x;+a,)* subject to the condition Xc,(x;+a,) = p. 
Making the appropriate substitution in (6) of the Appendix, we have the characteristic 
function of ¢*: 


pp \'y a 
(= Sn) Il (i — 2it p/m) t 


> MP aee)-a p-T y p -1 
——— (v2 1—2tp/a 2 - 1—2ttp/m ‘. 1- ron i. 
where the subscripts of p; and 7; are dropped. From this the expressions for the first three 
moments are derived. Thus 


Hy = (N —1) 2(pi/m,) + X(p,/7,) -—N, 

Mg = 4(N — 1) 2 (p§/m}) — 2(2N — 1) [2(pi/7,)}? + 22(pi/779), 

fy = 24(N — 1) X(pif/aj) — 24(2N — 1) [2(p3/7,)] [2 (pi /7})] 
+ 8(3N — 1) [2(pi/a,) > + 8 X(p§/7}). 

It will be seen that the only assumption made here, that (n;— Np,)/,/(Np,) is distributed 
as N(0,1) under H,, is parallel to the assumption on which the y*-test of goodness of fit is 
based, namely, that (n;—N7,)/,/(N7;,) is distributed as N(0,1) under Hj, which is justified 
when N7; are not too small. So, when Np; are not too small we can expect the moments in 
(48) to agree well with the true moments (the first two of which are given in (41) and (42)). 
Obviously the expressions for 4; are identical. The values of 4, in the cases examined in 
the investigation referred to on p. 217 were found to be very close. 

We may now obtain a representation of the distribution of ¢* under H, as a Type III having 
the first two moments of (48), that is, assume ¢*/p as distributed as x? with v degrees of 
freedom, where p = }y14/}, v = 2}2/,. Clearly this will be a better approximation than that 
of the Type III fitted from the 4}, #, given in (43), and the power function based on this will 
be closer to the exact one than that based on (38) and (39). But, although there is gain in 
accuracy, the simplicity of the approximate method is lost. We may similarly consider 
fitting a Type III distribution, using the true 4; and y,, but the labour of computation of ys, 
given in (42), appears to be prohibitive. 





(46) 


(48) 


5. CONDITIONAL POWER FUNCTIONS 


In §4 we have considered the power function of the x goodness of fit test when the null 
hypothesis is fuliy specified, i.e. is a simple hypothesis. But often we are interested in testing 
whether an observed sample has come from a certain type of population, so that we are given 








220 The non-central y?- and F-distributions and their applications 
x pp 


only the form of the population law, not the values of its parameters, say 0,, 0, ...,0,. Hy is 
then a composite hypothesis. Sometimes, also, we have to test the hypothesis that several 
samples are from the same population, without specifying anything about it. In these cases 
we obtain estimates of the unspecified parameters, say 7,, 7}, ...,T7,, from the sample and 


hence calculate the expected cell frequencies m;. Then, if the method of estimation is efficient}, 
?* = X(n;— m;)?/M; (49) 


is known still to follow approximately a x?-distribution with k—r— 1 degrees of freedom. 

Suppose now that as alternative to the composite hypothesis H,, there is a simple hypo- 
thesis H,. The question then arises: By estimating the m; on the assumption that H, is true 
and applying the y?-test, what chance have we of rejecting H,, when, in fact, H, is true? 

Some consideration has been given to this problem, and it seems possible to obtain a 
solution by making use, as a first step, of what David (1947, p. 339) has termed the conditional 
power function. This gives the chance of rejecting H) when the test is confined to a restricted 
set, S, of samples which provide the same values, say 7'{*, 7’, ..., 7 for the estimated para- 
meters. Thus, if the process of fitting involves estimating two parameters from the sample 
mean and variance, samples of a set would be those having a common mean and variance. 
Again, in testing for independence in a contingency table, the conditional power function 
would be obtained for a set of samples giving the same marginal totals (see Patnaik, 1948). 
The development of this method will be left for a later communication. 


6. THE NON-CENTRAL F'-DISTRIBUTION AND APPROXIMATIONS TO IT 


Suppose two independent variates, y;? and x3, follow respectively a non-central y?-distribu- 
tion with degrees of freedom v, and parameter A and a x?-distribution with degrees of freedom 
Vg. Then the ratio 








Pi Xi); 
X3/P2 
will have the following probability distribution: 
ie] et RA) (?4)" +j . ( Vy bias: ‘| “ 
P)= 3h ae Perit) F’ . 50 
av 2 Law, +j, V2) - vty 9) 


which may be termed the distribution of non-central F or of F’. This corresponds to Fisher’s 
distribution C (1928). Wishart (1932) considered it in the form of the distribution of the 
correlation ratio 


Later, Tang (1938) derived the same from that of y’?. 
If in (50) we put v, = 1, then it reduces to the distribution of non-central ¢?. Denoting the 
non-central ¢ by t’, we have 
,_2te 
uae, 


where z is a normal deviate with expected value zero and w is an unbiased estimate of its 
variance. Neyman (1935), Neyman & Tokarska (1936) and Johnson & Welch (1939) have 


+ Le. gives a solution not very different from the maximum likelihood or minimum x? solutions, 
which are nearly identical in large samples. 


— 





dealt 1 
fore cc 
Tak 
the no 
reduce 
The 


functi 


in wh 


Then 


whi 


m 


ie 


P. B. PatTnatk 221 


dealt with this distribution in detail and studied its various applications. We will not there- 
fore consider here in particular this special case of the F’-distribution. 

Taking the general form (50), we may, by analogy, call v,, v, the degrees of freedom and A 
the non-central parameter. It can be seen that when p, tends to infinity the distribution of F’ 
reduces to that of y?/v,. 

The characteristic function is obtained as an infinite sum of confluent hypergeometric 
functions 

d(t) = eval, -3, —"4it) +S a(a+ 1, -3, ~"sit) 


1 


in which the function, H(a, b, x), is the sum of the series 








a a(a+l) , 
+;** Simba) eee 
Thence we derive the following expressions for the first four moments about the origin: 
og, vo(v, +A) ] 
1 (vy—2) ry’ 
ee. Seer [(v, +A)? + 2(v, + 2A)] 
<a (va—2)(v,—4) 3 * : ; 
i V3 316 P (51) 
ps3 = a e- See lata) + 6(v, +A) (v, + 2A) + 8(v, + 3A)], 
_—_ V3 a 4 2 
i"  e e 
+ 44(p, + 2A)? + 48(v, + 4A) — 32A2], 





of which the first two were obtained by Wishart by a different method. 

Methods of evaluating the probability integral of the F’-distribution have been worked out 
by Wishart and Tang. They involve a considerable amount of labour. Following the pro- 
cedure adopted in the case of y’*, it may be possible to obtain a quick, though approximate, 
method by fitting an F-distribution with the exact first two moments of F’. If we regard 
F'/k as following an F-distribution with v and v, degrees of freedom, then, equating the 
expressions for 4; and “., we have 











Va(v, +A) — 
k(ve—2)¥, vg—2” 
v5 2 24)] = _ 3. 
Fy, — 2) (vg — 4) op FAP + 20+ 2A = aya)” 
which give the scale factor and the modified degrees of freedom, viz. 
amtA (tay 
is hE a V,+2A° (52) 


The same result will follow if we approximate the distribution of x; (the numerator in F’) 
by a Type III from the first two moments as in § 3-1. 
Using the above approximation, the probability integral 


r 
| Pov FP’ | AGF’ 
U 
















222 The non-central x?- and F-distributions and their applications 


F'/k 
is approximately equal to | Py,» (F)aF, 
0 


where k and v are defined in (52). This can be expressed in the form of an Incomplete 
B-function, viz. pv 

Pe dee 

(3) 


xp vF" /k 

VotvF" /k 
For given values of v,, v,, A and F’, we can therefore evaluate the integral from the Tables of 
the Incomplete B-function (K. Pearson, 1934). When p, is even or, if odd, is less than 22, we 
need interpolate only for x and 4»(= p). Four-point Lagrangian interpolation p-wise and 
linear interpolation x-wise will be necessary. 


Tang’s tables of P,; (the error of the second kind) (1938) give exact values of the integral 
of the #?-distribution, which, put in the F’-form, is 


where 























Fa 
I, Py vi E" | A) dF’, (53) 
zx 
Table 7. Approximate and exact values of the F’-integral, | Pry vg F’ | A) oF” 
9 im 
Vy Ve A x Approx. Exact 
3 10 +t 3-708 0-752 0-745 
4 6-552 0-919 0-918 
16 3-708 0-203 0-206 
16 6-552 0-520 0-517 
3 20 4 3-098 0-706 0-700 
4 4:938 0-889 0-887 
16 3-098 0-119 0-126 
16 4-938 0-350 0-347 
5 10 6 3-326 0-731 0-731 
6 5-636 0-913 0-914 
24 3-326 0-157 0-158 
24 5-636 0-463 0-461 
5 20 6 2-711 0-665 0-664 
6 4/103 0-869 0-870 
24 2-711 0-064 0-069 
24 4-103 0-244 0-245 
8 10 9 3-072 0-715 0-714 
9 5-057 0-909 0-908 
36 3-072 0-117 0-119 
36 5-057 0-409 0-408 
8 30 9 2-266 0-581 0-578 
9 3-173 0-815 0-813 
36 2-266 0-014 0-017 
0-085 0-088 









































plete 


les of 
2, we 
and 


egral 


(53) 


P. B. Patnark 223 


F, being the a-percentage point of the F-distribution with v,, v. degrees of freedom. Two levels 
of a were chosen for the tables, namely, 0-05 and 0-01, and the range of v, is 0 to 8. The tables 
have to be entered with ¢ = ,/[A/(v,+1)]. Since ¢ is at intervals of 0-5, the corresponding 
intervals for A are very wide, which therefore makes interpolation unsatisfactory. 

Table 7 gives the values of the integral (53) calculated by the approximate method indi- 
cated above, for certain cases where Tang’s exact values are available. The comparison shows 
that, in general, we have two-figure accuracy, while the error in the third place appears to be 
quite small near the tails. 

It is to be noted that the table compares the integral at only two points, the 5 and 1% 
points of the corresponding F-distribution. Due to the lack of exact values it has not been 
possible to judge the closeness at other points. However, some idea of the general accuracy 
could be had by comparing the true and approximate figures for different A’s with the same 
V;, Ve and a(= F,). 

It can be easily shown (see Hartley, 1948) that the maximum error in the F’-integral due to 
our approximation will not exceed the maximum error in the corresponding y’*-integral, 


that is, in x” ; : 
i. p(x’? | A) dy’. 


Table 1 on p. 207 gives an idea of the magnitude of the errors in the y’*-integral, and so we can 
say that the errors in the F’-integral will not be of a higher order. 
The percentage points of F’ can be obtained by interpolation in the F-tables (Merrington 
& Thompson, 1943), for the fractional v and v, and multiplying the interpolate by & in (52). 
Closer approximations to the probability integral and percentage points may be derived 
by the method based on the Gram-Charlier series, analogous to the second method of § 3-3. 


7. THE POWER FUNCTION OF THE ANALYSIS OF VARIANCE TESTS 


7-1. Evaluation of the power function 


The test of a general linear hypothesis may be formulated as follows: Suppose 2x,, 9, ...,%y 


be N normal variates with means £,, £s,...,&, and the same s.pD., o . £; is a linear function of 
s<N parameters, 6,,0,, ...,0,. Thus 


Es = Aj A, + Ag Iq t ... +05, 9,. 
The linear hypothesis specifies, say, r of these parameters, i.e. 
0,=F,, O,=8, .... A=. (54) 
It is possible by a suitable transformation of variates (see Tang, 1938) of the form 


¥; => Ci 2 + Cjg%q + eee + Cin tn 


N 
to transform T?= > (z,-—&,)* 
2 N-s N-s+r _ N 
into T? = a yj + = = (ys—95)* + >» (y;—;)*, 
j=1 j=N-s8+1 j=N-—s+r+1 


where 7; in the second sum is a linear function of A, Of, ..., 2 and 9; in the third sum is a linear 
function of all the 6’s, while the a’s and c’s enter as coefficients. 


+ [Further exploration shows that the differences between the approximate and true values are 


systematic, with regular fluctuations. Use is being made of this fact to prepare certain rather more 
extensive tables of the power function. Ep.] 








224 The non-central x*- and F-distributions and their applications 
To test the hypothesis (54) we consider the criterion 


min. (Af, .. eoeg ea, a ae A N-s+r is 
1} = a 2 2 55 
WE T30(0,, «5p, ---5 09) janeony YY | (55) 


If the hypothesis specifies such values for 0,, 0, ..., 0, that 7;’s in (55) vanish, then the numer- 
ator and denominator are the sums of r and N —s central squares respectively. So, the ratio of 
the mean squares follows an F-distribution. On the other hand, if the ; 8 do not all vanish, 
we have the ratio of a sum of r non-central squares to the sum of N —s central squares; hence, 
the ratio of the mean squares is distributed as non-central F’, the parameter A being 2} which 
can be expressed in terms of 6, ..., 09 (see Tang, p. 137). Thus we get the F-test of the analysis 
of variance and obtain the power function of this test with respect to an alternative hypo- 
thesis as an F’-integral. 

We shall now consider the question of evaluating the power of the analysis of variance test 
by taking as an illustration the simple case of k groups of observations 


Zylt = 1, ...,%; ¢ = 1,...,&). 
where A is the general mean, B, the deviation of the mean of the tth group from the general 


mean so that 2B, = 0 and z,,’s are random residuals, distributed normally with mean zero and 


S.D. = 0». The expressions for the mean squares between groups and within groups follow 
from the set-up (56): 





1 
—= mad. 3 (eu- Z,.)* = Ta moa & (u-2)% 


1i=1 


where the symbols have the usual meanings. Since (2,.—2..) is a normal deviate with zero 


mean and variance o3/n, we see that v is the sum of k non-central squares subject to the linear 
constraint 


k 
(a -2. +B,).=0. 


Since further 2B, = 0, we find from the result of § 2-3 that v is distributed as 03 y}?/(k—1), 
where x;” has (k— 1) degrees of freedom and parameter 


A = n= B}/o?. 


Writing S? = (XB?)/k (57) 
for the variability between the groups, we have 
A = kn8S?/03. (58) 


Now 2, follows the distribution of 3 y3/[k(m — 1)], where x3 has k(n — 1) degrees of freedom. 
Hence v/v, is distributed as l 


1 
a 2 
k-1" lm—n* 
i.e. as F’ with v, = k—1, v, = k(n—1) and A given by (58). 


In this example we desire to test for any possible difference between the averages of the 
groups, so that our null hypothesis is 


B, = B,=...= By =0. (59) 








Ww 
F 


zero 
near 


—1), 


(57) 


(58) 


lom. 


the 


(59) 


P. B. Patnalk 225 


Then, from (57), S? and therefore A is zero. Hence v/v, follows an F-distribution and we get an 
F-test. Thus the test of the hypothesis in (59) is based on the critical region 


ei, (60) 
Vo 


where « is the significance level at which we are testing. 

Let us consider an alternative hypothesis that the B,’s are not all zero. Then it is known that 
the power function, that is, the probability that (v/v,)>F,, depends only on the single 
parameter S? TB? 

3 kos’ 
Hsu (1941) has shown that amongst all critical regions of size a, whose power functions depend 
on the single parameter (S?/o2), the critical region of (60) is the most powerful. 

Thus we specify the hypothesis alternative to the null hypothesis (59) by the single para- 
meter S?/o% in place of the individual parameters, the B,s. In certain situations, as, for 
instance, in a manufacturing process, we are more interested in detecting the over-all 
variability in a set of machines than in detecting the deviation of each particular machine 
from the general machine average. Then the power function will be useful in measuring the 
chance of detecting this over-all variability by means of the F-test. 

The power function of the analysis of variance tests has been considered by Tang (1938) 
and Hsu (1941). The rather restricted scope of Tang’s tables has already been mentioned in 
§6. The labour involved in computing the exact values of the power is very heavy, and no 
tabling on an extensive scale has so far been found possible. However, wita the approxima- 
tions to the F’-distribution derived in § 6, we may obtain easily a sufficiently accurate value 
for the power function of the test of any linear hypothesis. 


Returning to the case of k groups and kn observations, we have the power function given by 


S\ — (° >. UF’ |ayaF’ 
(ea) ‘i [ja va | ) ‘ 


where F, is the « percentage point of the F-distribution with degrees of freedom 1,, V9. 
Following the procedure of § 6, this integral approximately equals 


| Py» (F)AF, (61) 
F'.¥,/(¥.+A) 
_ rah 
where aes 
Therefore, to this approximation, we have 
S2 
(=) = Zoldye ) 
7 (62) 
: ° (v, + 2A) v, 
h =— oN 
sprees * (vy + 2A) vg + (¥, +A), F, 


7:2. The difference between systematic and random effects 


Next we shall consider two alternatives that arise in practical situations—the random and 

systematic set-ups (see Daniels, 1939) which may best be described in terms of two examples: 

If the groups in the previous illustration correspond to villages and the observations are the 
Biometrika 36 1s 








226 The non-central y?- and F-distributions and their applications 


yields of fields in a crop survey, then we can regard the & villages as a random sample from 
a population of villages and the random set-up represented by 

Uy = A+ Y,+% (63) 
becomes relevant. Here, A is the general mean, y,’s are the group means which are inde- 
pendent random variables with expected value zero and s.p. = o, and z,,’s the random resi- 
duals having mean zero and S.D. = 05. 

On the other hand, if the groups correspond to k machines which, from the user’s stand- 
point, constitute the entire population of machines, we cannot regard them as a sample, and 
so the systematic set-up, in (56), considered on p. 224, is relevant. The null hypothesis in the 
random set-up is that the parameter c? = 0, and in the other that S? = 0 (which is equivalert 
to (59)). But it is easily seen that both lead to the same F-test for the null hypothesis. 

In applying the test, we are on the look out for the existence of alternative conditions, 
where in one case o? and in the other S? is > 0. It will be noted that (.S?/02) of the systematic 
set-up corresponds to (o?/a3) of the random set-up. Both are measures of relative variability 
between groups and may be termed ‘relative group variability’. 

It is possible to relate the power function under the random set-up to that under the 
systematic set-up. If we regard the & groups as a sample from an infinite number of groups, 
then © B?/(k—1), i.e. kS?/(k—1) will be the sample estimate of the population variance o°. 
Thus treating S? as a random variable having a probability distribution denoted by 
p(S?/o?), we can obtain the average power over all the S?’s. Thus 


B= |" A=) 18*| 0%) as* 


gives the power when the random set-up applies. 
This power / for given (o*/¢@) is directly obtained (see Johnson, 1948) from the F-integral: 





4 
Vr pnorjete 1 PePVEF= [Fey a nga PonlFlF (64) 
where vy,=k-1, vye=k(n—-1) and A= kno?/o?. 
This can be put in the form of the Incomplete B-function 
T,(4¥2, 4%); | 
, (vy, +14+A)v , (65) 
yhere kaze 1 2 

er. (vy) + 1+A)v.+(¥, +1), F | 


It is interesting to note a result which we believe is true in general and which on intuitional 
grounds might be expected to hold, namely, if the null hypothesis is not true, then for the 
same numerical values of the ratios S?/o§ and o?/o%, the power of the F-test is greater in the 


systematic case than in the random. Four particular cases have been examined numerically 
as follows: 











———_——— a 
| (@) (6) () | @) 
So ae | 
Number of groups, k 4 4 12 10 
Number of observations 6 11 6 ll 
in each group, n 





























ral: 


(64) 


(65) 


onal 
the 
the 
ally 


P. B. Patnaik 


227 


Values of the power have been calculated, using equations (62) and (65), and are plotted in 
Fig. 2 (a)-(d) as ordinates against o*/o% (= S?/0%). We find from these that the systematic 
power curve lies above the other; further, we note that the curves are closer to one another 
in (c) and (d) than in (a) and (6), a fact which agrees with theory that the two power functions 
must tend to each other with increasing k. The errors of approximation in calculating the 
power in the systematic case are likely to be small judged by the comparative Table 7 and 


should not affect the relative positions of the power curves. 


























10 
al 
= 7 er 
= 05+ “il 
r) Po 
a Po 
0 t T T T Tt T 7 T 
0 os 10 
ofr? 
(a) k= 4, n=6. 
10 ee ee 
ve 
4 a 
4 
4 7 
¢ 
, 
é 
: j 
§ 05 f 
a i 
0 tT 7, ¥ , t Tt t 
0 05 10 
oe? 


(c) k= 12, n=6. 





190 





— 








os 10 





am 














T T T T T 


of 
() k=10, n=11. 


05 10 


Fig. 2. Power curves for the random and systematic set-ups for k groups with n observations in each: 


——-—-— random, 





systematic. 


This relation may be interpreted in a different way. Taking case (a) above, it will be seen 
from Fig. 2(a) that we can detect, for instance, a ‘systematic’ relative group variability of 
0-45 with a 70% chance, while we cannot, with the same chance, detect a variability of 
magnitude less than 0-9 in the random case. The difference is of course to be expected. For the 
random set-up, our appreciation of o* is obscured by random variations in both y and z of 
equation (63); for the systematic set-up, our appreciation of S* is only obscured by random 


fluctuations in the z of equation (56). 





15-2 











228 The non-central y?- and F-distributions and their applications 


7:3. Applications of the power-function 

We will be concerned here mainly with the systematic set-up and will illustrate the 
application of our results, taking the simple case of k groups and n observations. The treat- 
ment is, however, quite general and could be applied to any designed experiment as outlined 
in the general statement given at the beginning of §7-1. 

Two types of question may be asked in connexion with the test for differences between 
groups: 

(a) What is the extent of departure from the null hypothesis, measured by (S?/o2), that 
could be detected with a given chance? 

(6) How many observations~  ~e to take in each group so that we could detect a given 
ratio of between group to within ,roup variability (S?/o2) with a prescribed chance? 

To answer these questions we have to examine the function {(S?/02) which may be written 
in the form 
—s (32) . 
Perr Mia) = 8 % TABU, +E Pa) neato 
and consider its inverse, i.e. A=A(v,,¥_,%,). Generally, A has to be obtained by inverse 
interpolation from tables of # such as Tang’s. The interval of tabulation of 0-5 for 


$ = VfA/(,+1)] 

in Tang’s tables is not fine enough for interpolation to be satisfactory. Still, they give a trial 
value of ¢ for which £ is calculated and then corrected with the help of the derivative 0f/0¢. 
Following this rather laborious method, Emma Lehmer (1944) has tabled ¢ for a = 0-01, 
0-05 and # = 0-7, 0-8 and for a wide range of v, and v,. For these two values of the power we 
may use her tables to obtain our A. It would clearly be of value for these tables to be extended. 

We may, however, for any set of values of ,, v., « and £, get A approximately with the help 
of the approximate form of # given in (61). Taking a trial value of A we can find two consecu- 
tive integers A,, A, between which A lies by the following method. From the expression (61) 
for 8 we see that A must satisfy the relation 





x14) (1-2) dz, from (50), 


Fylv, %9) = Sy Fal) (66) 
where the arguments v, v, and 1, v, are the degrees of freedom. Hence the two integers A, 
and A, would make the right-hand side of (66) just greater and just less than the left-hand 
side. These can be got by trial and error, taking the a and f# percentage points from the 
F-tables and comparing the t. o sides. (It is to be noted that » in (66) involves A.) For these 
values of A, and A,, f is then evaluated using (62) and by backward interpolation A is de- 
termined. 

To deal with inverse problems, such as (b) mentioned above, a graphical representation of 
the relation between v,, v, and A for fixed « and £ will be most useful. Following the procedure 
described above for finding A, charts have been constructed for a = 0-05 and for two levels of 
power, # = 0-5 and 0-9, which are likely to be of practical interest (see Figs. 3 (a), (b)). The 
charts give, to the approximation involved in (61), contours of equal power and could be used 
for determining any one of the three quantities, v,, v, and A, given the other two. When 
V¥, = 00, the F’ reduces to x’/v,, and hence these charts could also be used for answering the 
inverse questions connected with the power function of the x?-test (see p. 215). 

We give here two illustrations of the use of these charts. 








the 
aat- 
ned 


een 


hat 


ven 


jten 


Prseé 


P. B. Patnaik 229 


Illustration 1. To study the seasonal variation in the frequency of occurrence of a particular 
dominant alga in a pond, ten samples of 15c.c. of water are taken from the pond on the first 
day of each of the five months, April to August. Fifteen drops are taken on slides from each 
sample after shaking it thoroughly, and the number of algae of the particular form are 






































































































































20 ne 
7” 
A a 
=1 
Vy 7 PP aii it 
15  .< a 
Z Bas al 
R 4 Lae) 
— a ae 
ro) 
, 2 aa at ) 
$ = Ls ATA — _ 5 we 
= - A L— ee oe og rs ee) 

2-4 - —— —_— 

— pP at sat 4 -_— 
oe, ot Oe oe 3 ee ee | 
La | [ks Vc ae 

2. ae 
5 = ae 
0 7 T T qT | T 
© 60 30 20 16 12 10 8 
120 40 24 18 14 11 9 
Values of v, 


Fig. 3a. Contours cf equal power for the analysis of variance test with the systematic set-up: 
a=0-05, and a power A(¥,, ¥,,A)=0-5 


counted under the microscope and the total for the fifteen slides is taken as the density for 
each sample. ' 

To test whether there is significant variation in the density of this form of algae from month 
to month, the analysis of variance test is applied, say, at 5% level. It will be of interest to 
know how large should the ratio of the seasonal variability to the variability in the pond be, 
so that we could detect it with a 90% chance. 








230 The non-central x?- and F-distributions and their applications 


Here, vy, = k—1 = 4, v, = k(n—1) = 45. For these, the chart of Fig. 3(b) gives A = 16-8, 
from which we find the ratio of between month to within month variability 


S?/o3 = A/nk = 0-34, nearly. 





























































































































45 7 
40 * ¢ Pi L 
A P al - oe 
355 ® ai P ud av 
IAP Te Tot 
LT es 624 a 
R LAK Pa oe as 
° é oil - 
3 WAN A | 4 
3 Awa Lt | ios —— 
o 20 a a” _— 3 
> - i _ — 
15-4 — | Se ~~ we 
_ 
10 
5- 
0 T T T T T T 
© 60 30 20 16 12 10 8 
120 40 24 18 14 11 9 





Values of v, 


Fig. 36. Contours of equal power for the analysis of variance test with the systematic set-up: 
a=0-05, and a power /(”,, v2, A) = 0-9. 


This means that the odds are 9 to 1 on detecting differences at the 5% level if the s.p. 
of the density of the algae between months was 0-58 of the s.p. within the pond. On the other 
hand, using Fig. 3 (a) we see that there would be a 50: 50 chance of detecting differences if the 
8.D. between months is 0-38 of that of a single sample in a month (S?/o2 = 0-145). 


Illustration 2. There are seven machines producing copper wire for electric cables. It is 
intended to control the variability in the thickness of the wire due to the machines by taking 





ees 


—— 








rat 


ha 


re 





16-8 


> 





Sr _ se 


P. B. Patnark 231 


samples from time to time and testing for differences between the machines. From previous 
observations we have some idea of the order of variability in the product of a single machine; 
suppose we do not regard the variability between machines as serious if it does not exceed 
0-25 of the within-machine variability. How many samples of wire must we take from each 
machine to have a 90% chance of detecting, at the 5% level for F, a between-machine 
variability of this magnitude, if it exists? 


A Was ve , 
Since he a in virtue of (58), we have now to find n satisfying the relation 
2 na 0 
* . § 006 
Yo n- 


Following the contour in chart 3 (6) for v, = 6, we find by inspection a point on it at which the 
ratio of the co-ordinates is nearly 0-25. This point gives v, = 75 from which we obtain the 
number of samples required, » = v,/k+1 = 75/7+1 = 12, approximately. On the other 
hand, from 3 (a) we find that we would have a 50 % chance of detection, if n = €. 


In conclusion, I should like to acknowledge gratefully the help and guidance I have 
received from Prof. E. 8S. Pearson and Dr H. O. Hartley *i i*e course of my investigations. 


APPENDIX 


Distribution of the sum of squares of independent normal 
variates with different means and variances 


Let £,, 2, -..,, be m independent normal variates with expectations 6,, bs, ...,6,, and variances 











U1, Vg, ---, V, respectively. The characteristic function of the statistic 
yr= vSF (1) 
j=1 
is easily obtained. Introducing x; = (€;—);)/Jv;, 
we note that each x; follows the probability law 
1 
4 = »— hx? 
and that y? in (1) can be written as 
y? = Lu,(x;+4;)°, (2) 
where a; stands for b,/,/v;. (All summations are from j = 1 to n.) 
The characteristic function of y* is given by 
n l oO s ‘ 
A(t) = I Fail oP {itv,(x; +a;)®— $29} az,|. (3) 
: : “ |{ 20 | itujaz | 
The integral in (3) is equal to al (; =) ~ exp \i—2ito,)" 
n S(ity.a2 
= <i ».\-2 | 2.(vtv; @5)\ 
Hence P(t) I (1 — 2itv;)-texp \i—2itw, |” (4) 


from which all the moments of the required distribution can be derived. We may represent 
this approximately by a x?-distribution fitted from the first two moments, 4, = Lv; + Zv,aj, 
and 4, = 22} + 42 Fa}. 






































232 The non-ceniral y*- and F-distributions and their applications 


Next we consider the conditional distribution of y* in (2) subject to a single linear con- 
straint on the z,’s, viz. Ze,(x;+a;) = p. 


The characterististic function of the joint distribution of yy? and p is given by 


1 r ‘ : 
P(t, t,) = nl sas5{ exp{— Ie} tito(e, +a) + inex, +a)}dey | (5) 
Y —© 
On performing the integrations in (5) we find 





2it v,a2 + 2it,c;a ticj| gr 
t,t,) = Il} (1 —2itv,)-* ex ‘hat | Ne Als sed 
#(t,4) [\ i) P| 2(1 — 2itv,) J the 
The conditional characteristic function of y*, for fixed p (Bartlett, 1938), is Li 
@ 2 
#t|p) = [~ emegtttyydt/ |” emeg(o,t) dt, 
-—@ —o 
2 -t 
c 
= (=(—4~— }}_ I(1-2itv,)-4 Ha 
( (Si) ( i) 
ly itv;aj 1/ C;Q; s cF 1  1(p—e;a;)*) = 
x exp (4 — -{p—a( 1% _ 2(— ) Mts a 
P \ (; - i) 30 (; _ S)) 1 — 2itv; +3 Yc# | (6) : 
= oe . : : def 
The moments of the conditional distribution of y? can then be obtained from (6). ‘ 
Again, we may fit a Type III to the conditional distribution of y* by using the first two Tt 
moments. ‘ 
REFERENCES 
Barrett, M. 8. (1938). J. Lond. Math. Soc. 13, 62. } a 
CornisH, E. A. & Fisuer, R. A. (1937). Rev. Inst. Int. Statist. 5, 307. - 
DaniE.:s, H. E. (1939). J.R. Statist. Soc. Suppl. 6, 186. 
Davin, F. N. (1947). Biometrika, 34, 339. 
Fisuer, R. A. (1928). Proc. Roy. Soc. A, 121, 654. w 
Fisuer, R. A. (1931). Introduction to the Brit. Ass. Math. Tables, 1, 26. 
GotpsBeErG, H. & Levine, H. (1946). Ann. Math. Siatist. 17, 216. 
Hatpang, J. B. 8. (1937). Biometrika, 29, 133. 
Hart ey, H. O. (1948). Biometrika, 35, 417. 
Hsu, P. L. (1941). Biometrika, 32, 62. \ it 
Jounson, N. L. (1948), Biometrika, 35, 80. 1 
Jounson, N. L. & Wetcu, B. L. (1939). Biometrika, 31, 362. : 
JORGENSEN, N. R. (1916). Undersogelser over Frequensflader og Korrelation. Copenhagen: Busck. 
LEHMER, Emma (1944). Ann. Math. Statist. 15, 388. 
MERRINGTON, MAXxINE & THOMPSON, CATHERINE M. (1943). Biometrika, 33, 73. 
NeyMan, J. (1935). J.R. Statist. Soc. Suppl. 2, 131. 
NeyMan, J. & Tokarska, B. (1936). J. Amer. Statist. Ass. 31, 318. ’ 
Patnatk, P. B. (1948). Biometrika, 35, 157. 
Pearson, K. (1900). Phil. Mag. 50, 157. ( 
Pearson, K. (1916). Biometrika, 11, 145. | 


Pearson, K. (1922). Tables of the Incomplete Gamma Function. London: Biometrika. ; 
Pearson, K. (1934). Tables of the Incomplete B-Function. London: Biometrika. 

Tana, P. C. (1938). Statist. Res. Mem. 2, 126. 

THOMPSON, CATHERINE M. (1941). Biometrika, 32, 188. 

Wisnanrt, J. (1932). Biometrika, 24, 441. ] 





con- 


(5) 


(6) 


t two 


isck. 


Ee 


[ 233 } 


MISCELLANEA 
On a method of estimating frequencies 
By D. J. FINNEY 


Haldane (1945) has discussed the estimation of the frequency of an attribute by inverse binomial sampling, 
a method which requires that random sampling be continued until a specified quota of individuals with 
the attribute has been obtained. For example, in a study of an abnormality of blood cells which affects 
a proportion p of red corpuscles, counts might be made until m abnormal cells had been recorded. The 
probability that exactly » cells in all are counted in order to give this quota of abnormals is 


m—1 
m—1 
Haldane showed that = — 


is an unbiased estimate of p. He then investigated the variance of x, but did not give an unbiased estimate 
of this variance as a function of 2. 


The study of the sampling distribution of z may be assisted by consideration of functions U(a, £), 
defined by at a as 
U12,p) = (" a—Bp )/¢ i} 
m—a—l m—1 


E{U(a, £)} = X w, U(a, f) 
n=™ 


The average value of U(a, f) is 


= p7g?. 


Haldane’s formulae (2) for o?, the variance of x, may alternatively be derived by expanding 2° in a series 
of U(2,), namely, 


U(2,1)  2!U(2, 2) 3! U(2, 3) 


ns eee m(m + 1) m(m+1)(m4+2) °°” 








whence o? = E(x*)—p* 


p*q 2!q 3!g* 
=—|1 - f+... ]. 
m [ tint (m4 l)(ma2)* 


Define s? = 2*— U(2,0); 
it is then apparent that E(s*) = 0°. 





This s*, an unbiesed estimate of the variance of x, differs from the function obtained by substitution of 
x for p in the formula for o*. Indeed, s* can be expressed very simply in terms of the sample, since 


Pet (*=)'-S= a 








n—1 (n—1)(n—2) 
_ 4—Oe-4) 
~ (n—1)?(n—2)’ 
_ x(1—z2) 
or e= are 


This is the most convenient form for computation, but for the planning of a sampling investigation an 
expression in terms of m is more suitable: 
a*(1—2) 
3? = ——__—_ 
m—-l—z 


1 
The exact formula for s* shows that Haldane’s approximation aS) is slightly biased. His conclusions 


about the choice of a quota in order to give a specified precision for the estimate of p are not materially 
altered. Since 8 l—z 
z J m—1—z’ 








234 Miscellanea 


1 
which is approximately equal to J as when z is small, an upper limit to the size of the standard error 


relative to x can be fixed in advance of sampling by an appropriate choice of m. 

The standard error is a satisfactory indicator of the error of estimation of p only when m is large. For 
small m, limits of error (analogous to fiducial limits but logically distinct because of the discrete nature 
of the distribution) can be read from Fisher & Yates’s (1948) Table VIII,, by the following rules: 

(i) The lower limit is the lower limit for a direct binomial sample which has m successes in n trials; 
enter the table with a = m, N = n. 

(ii) The upper limit is the upper limit for a direct binomial sample which has (m — 1) successes in (n — 1) 
trials; enter the table with a = (m—1), N = (n—1). 

These limits are the highest and lowest values of p which just fail to be contradicted by the sample, 
in a significance test based upon the chosen level of the probability (Finney, 1947). 

In direct binomial sampling, the total number of individuals bearing the attribute, in a sample of N, 
is usually the only record that is made. Since the practice of inverse sampling involves the collection of 
the data in order, the sample can also provide evidence of whether the condition of independence of suc- 
cessive observations is fulfilled. If successive individuals are independent of one another, (m— 1) having 
the attribute should be distributed at random intervals throughout the first (n — 1) counted; a departure 
from independence, such as would result from a clustering of abnormal cells, will increase the frequency 
of short and of long intervals between these individuals at.the expense of intervals of moderate length. 
A test of significance might be based upon the observed frequency with which abnormals are preceded 
and followed by normals (Wishart & Hirschfeld, 1936; Iyer, 1947), or on some other statistic based upon 
the length of intervals. Of course, significant deviation from the value predicted by a hypothesis of 
randomness of intervals, whether resulting from a clustering of abnormals or (a less common pheno- 
menon) from an exceptional regularity of intervals, would indicate that the standard error and limits 
of error discussed in this note were not applicable. 


REFERENCES 


Finney, D. J. (1947). Errors of estimation in inverse sampling. Nature, Lond., 160, 195. 

FisuEr, R. A. & Yates, F. (1948). Statistical Tables for Agricultural, Biological and Medical Research, 
3rd ed. Edinburgh: Oliver and Boyd. 

Hapang, J. B.S. (1945). On a method of estimating frequencies. Biometrika, 33, 222-5. 

Iyer, P. V. K. (1947). Random association of points on a lattice. Nature, Lond., 160, 714. 

WisuHart, J. & HirscHFELD (HaRTLEY), H. O. (1936). A theorem concerning the distribution of joins 
between line segments. J. Lond. Math. Soc. 11, 227-35. 


A further note on the mean deviation from the median 
By K. R. NAIR 


In an earlier (1947) note the author compared the standard errors of unbiased estimates of the standard 
deviation o of a normal population obtained from the ‘mean deviation from mean’, m, and the ‘mean 
deviation from median’, m’ in a random sample of size n, when n = 2, 3 and 4. The problem reduced to 
the comparison of coefficients of variation of m and m’, 

(i) When n = 2, both m and m/’ are identical, and hence their standard errors and coefficients of 
variation. 

(ii) When n = 3, m and m’ are not identical. The c.v. of m is 


Jal Fe +¥8-3) = 0-52486. (1) 


The author was not aware of any exact formula for c.v. of m’, but from published numerical tables 
found it to agree with the right-hand side of (1) to five places of decimals. A recent paper by Jones* 
(1948) enables us to calculate the exact expression for the c.v. of m’, when n = 3. It comes out to be iden- 
tical with the left-hand side of (1). 

(iii) When n = 4, Jones has derived expressions for the second momen(s and the product moments of 
order statistics but not for the first moments. Hence no exact expressions could be derived from his 
results for the c.v. of m’, when n = 4. The numerical value the author has calculated from the exact 
distribution of m’ in his (1947) note is perhaps the best approximation known so far. 


* I am indebted to Dr Churchill Eisenhart for drawing my attention to Mr Jones’s paper. 








i 
i 
] 





search, 


f joins 


ndard 
‘mean 
iced to 


mts of 


(1) 


tables 
Jones* 
e iden- 


ents of 
om his 
» exact 


Miscellanea 235 


(iv) The exact sampling distribution of m’ has been worked out by Godwin (unpublished; see Nair, 
1948), but its probability integral is not very easy to tackle when n> 4. Exact expressions for S.£. of m’ 
when n> 4 are also not available at present. 

A recent paper by Hastings, Mosteller, Tukey & Winsor (1947) gives tables of means, variances and 
covariances of order statistics for samples of size 2<n< 10. The covariances are believed to be correct to 
within 1 unit in the second decimal (except for one or two values which may be off by two units). 

With the heip of these tables, the c.v. of m’ was calculated for n = 2 to 10 to two places of decimals, 
and are given in Table 1 alongside corresponding values of C.v. of m obtained by substitution in the 
exact formula of the latter. 


Table 1. Coefficient of variation of m and m’ 








n C.v. of m C.v. of m’ 

2 0-76 0-75 

3 0-52 0-52 

4 0-43 0-43 

5 0-37 0-37 

6 0-33 0-34 

7 0-31 0-31 

8 0-28 0-29 

9 0-27 0-26 
10 0-25 0-25 

















There is an error of 1 unit in the second decimal in the value of c.v. of m’ for n = 2, since theoretically 
it should have been the same as C.v. of m. For n = 3 also both c.v. are theoretically identical and Table 1 
is in agreement with this fact. The c.v. of m’ for n = 4 obtained in the wuthor’s (1947) note was 0-44 com- 
pared to the value 0-43 given in Table 1. It is believed that the error is within 1 unit in the second decimal 
in the values given in col. (3) for n> 4. 

The closeness of the values of C.v. of m and m’ in the range of sample sizes 2 <n < 10 reinforces the strong 
practical grounds, namely, greater simplicity in calculation, for using m’ rather than m. 


REFERENCES 


Hastines, C., Moste.x.er, F., Tukey, J. W. & Winsor, C. P. (1947). Ann. Math. Statist. 18, 413. 
Jones, H. L. (1948). Ann. Math. Statist. 19, 270. 

Narr, K. R. (1947). Biometrika, 34, 360. 

Narr, K. R. (1948). Biometrika, 35, 118. 








[ 236 ] 


REVIEWS 


Theory of Probability. By Harotp Jerrreys. (Second edition.) Oxford University 
Press. Price 30s. 


The first edition of this book ajypeared in 1940 and was missed by many (including the reviewer) who 
were on war service. This second edition, according to the author, has had added to it further arguments 
which go far towards establishing the principle of inverse probability ; also a theory of invariance has been 
developed and applied to problems of estimation and significance. On reading the book one cannot help 
being sharply aware of the breadth of reading and the width of scientific knowledge which is the 
author’s, but the reviewer, at any rate, remains unrepentantly unconvinced by the subject-matter. 
Prof. Jeffreys’s approach to the theory of probability is too well known to require exposition here. 
Generally, it may be summed up by quoting the idea so often expressed in this book—‘no probability 
is simply a frequency’. This is a perfectly legitimate point of view, even if it is not shared by many 
whose business in life is the provision of an objective criterion in the shape of a probability figure which 
is to serve as a guide for future action on the part of others. In practice, the writer suspects, Prof. 
Jeffreys would calculate a probability in the same way as other persons who follow (say) the frequency 
approach to the subject, but in theory it would be possible for two persons following Jeffreys to obtain 
different values for the same probability. For the author writes: ‘Our main postulates are the existence 
of unique reasonable degrees of belief, which can be put in a definite order.’ It does not seem possible 
so to standardize degrees of belief that the order will be the same for each and every person, and it 
would appear therefore that the uniqueness can really only apply to the single person. Actually the 
calculation and interpretation of a probability is a compound process. By some objective standards 
@ numerical figure is reached, and should be reached, by a logical process which in its application leads 
all persons to the same conclusion. It is in the interpretation of this numerical figure that the subjective 
factor will enter, the final judgement being the result of a complexity of impressions in the interpreter’s 
mind, varying from his psychological make-up to the consequences involved by the taking of a wrong 
decision. For each person this interpretation must inevitably be different. 

Prof. Jeffreys writes at length both on theories of testing hypotheses and of estimation. His ideas 
on the Neyman-Pearson concept of a power-function seem a little hazy, but much of what he has to 
say is both pertinent and interesting. It is, however, a little disconcerting to be told by the foremost 
exponent of inverse probability and the applications of Bayes’s theorem that there is general agreement 
between himself and Prof. R. A. Fisher. Many of us first disentangled ourselves from Bayes with the 
help of R. A. Fisher’s papers on inverse probability—I think in particular of ‘Uncertain Inference’ 
(Proc. Amer. Acad. Arts Sci. 1936)—and are inclined to find relevant the remark and the quotation 
made by Coolidge: ‘We use Bayes’s formula with a sigh, as the only thing available under the circum- 
stances: ‘Steyning tuk him for the reason the thief tuk the hot stove—bekaze there was nothing else 
that season.”’ But it may be that until the statistical theory of estimation is placed on more secure 
foundations, the battle of Bayes will have to be fought by each generation. 

This book is not suitable for students starting to read probability and statistics. It may profitably 
be read by those of some maturity, for Prof. Jeffreys’s ideas and examples will materially help in the 
interpretation of statistical theory no matter what school of probability the reader favours. 


F. N. DAVID 


Karl Pearson's Early Statistical Papers. 557 pp. Cambridge University Press (for the 
Biometrika Trustees). 1948. Price 21s. 


This volume should find a place on the shelves of every statistical library. It includes reprints of eleven 
of Karl Pearson’s early memoirs, from 1894 to 1916, memoirs of which reprints have for many years 
been unobtainable. They have been reproduced photographically by the litho-offset process, so that 
the reprints can be trusted beyond question as facsimiles of the originals, without any of the doubts to 
which resetting might occasionally give rise. Those reprinted from the Philosophical Transactions and 
the Drapers’ Company Research Memoirs have been reproduced, apparently, with a slight reduction in 
size which makes no appreciable difference in legibility, while the page of the x?-memoir reprinted from 











— 









emeroenunu wt ome aa oe Ss 


-_—> -*> 


ae i. i -_ ia 


rsity 


) who 
ments 
s been 
t help 
is the 
atter. 
here. 
bility 
many 
which 
Prof. 
1ency 
btain 
tence 
ssible 
ind it 
y the 
dards 
leads 
ctive 
eter’s 
yvrong 


ideas 
as to 
most 
ment 
h the 
ence’ 
ation 
cum- 
x else 
>cure 


tably 
n the 


AVID 


the 


even 
ears 
that 
ts to 
and 
on in 
from 











Reviews 237 


the Philosophical Magazine has been enlarged sufficiently to give an increase of legibility very pleasant 
to aged sight. Messrs Bradford and Dickens can be congratulated on the excellence of the printer's 
work, and the price of the volume is astonishingly low for a quarto volume of over 550 pages. The one 
disadvantage of the process used, which necessitates a coated paper, is that the volume is very heavy 
(4} Ib.), but this is hardly a serious fault in a volume not intended for idle hours. In effect the papers 
cover a period of eleven years only, 1894-1905, for the isolated paper of 1916 is the Second Supplement 
to the Memoir on Skew Variation of 1895. 

One who took part in the work associated with several of these early papers may be forgiven for 
indulging in affectionate recollections in a brief survey. The first memoir reprinted here is inevitably the 
rather insufficiently entitled ‘Contribution to the Mathematical Theory of Evolution’, which actually 
deals with the problem of dissecting a distribution assumed to be compounded of two normal curves. 
The memoir is of historical interest, apart from its special problem, as the method of moments is used for 
fiiting and the term ‘standard deviation’ is introduced, I think, for the first time in print. Weldon’s 
raeasurements on crabs were used for illustration in the original memoir, but much fuller +Ilustration, 
on skull measurements, was given in the Phil. Mag. paper of 1901 (6th series, vol. 1, pp. 110-24). 
I well remember the work of calculating moments up to the fifth for this memoir or for work subsequent 
thereto, and the solution of the nonic. 

The second memoir reprinted is the well-known initial memoir (1895) on Pearson’s family of skew 
curves, ‘Skew Variation in Homogeneous Material’, which was supplemented later by two further 
memoirs, published in 1961 and 1916, dealing with certain subtypes which had escaped attention. Each 
of the three memoirs is cheracteristically and very fully illustrated by examples, so fully that it is hardly 
too much to say that these memoirs created a revolution in the general view as to the characteristics 
of frequency distributions. Previously, the ‘normal distribution’, as implied by its name, had been 
regarded as the common type and other forms as somewhat odd and aberrant types. Henceforward, 
skew distributions of various degrees of skewness had to be recognized as the common forms, and the 
normal distribution as a highly exceptional limiting type. I recall working on some of the examples 
in the memoir of 1895, and doing some of the drawings of theoretical types, in the course of which the 
approximate relation between mean, median and mode was discovered, the median being determined 
by the use of an Amsler planimeter on the graph. A good deal of the preliminary work in this paper, on 
formulae for ‘correcting’ moments, has been superseded, but the memoir well illustrates the flexibility 
of the method. 

After these two studies of variation in a single variate, studies in bivariate and multivariate variation 
naturally follow and the third reprint (Evolution series, m1, 1896) deals with that subject, but—and 
this is surely quite characteristic—not in purely symbolic and mathematical terms as a general method, 
but entirely with reference to the biological field, under the title ‘Regression, Heredity and Panmixia’. 
The procedure may have some advantage in illustrating the application of the statistical meth d to 
real problems, but it certainly has the very real disadvantage of obscuring what has been discovered in 
general method. In this paper Pearson assumes the normal law and uses the form given by Bravais, 
but introduces a single symbol r to replace the quantity S(zy)/(no,0,), and shows that this expression for 
r makes the observed result the most probable, and hence it is the ‘best’ value for the coefficient, and 
preferable to the method of determination used by Galton and Weldon. There is a short section on the 
probable error of the coefficient of correlation so determined, leading to a result which was recognized 
as erroneous in the subsequent memoir (IV) by Pearson and Filon. The mistaken value was actually 
the s.z. of r for assigned errors in ¢, and ¢. The fifth reprint, ‘On the Reconstruction of the Stature of 
Prehistoric Races’, also carries somewhat further the theory of correlation, but the story of its develop- 
ment is an astonishingly tangled one. 

The ground having been cleared by the discussion of the theory of frequency curves and the theory 
of correlation, problems of sampling had become of much more importance, and the fourth memoir of 
the ‘Evolution’ series, ‘On the probable errors of Frequency? Constants and on the Influence of Random 
Selection on Variation and Correlation’ (1898) is the fourth memoir selected for reprinting. It was 
written, it will be remembered, in conjunction with L. N. G. Filon, who was then Pearson’s Demon- 
strator. The sixth reprint is that of the classical paper on the x* method of testing ‘Goodness of Fit’ 
(Phil. Mag. 1900), to which a list of the papers on the same subject that followed it would have been 
a useful appendix. ; 

The eighth reprint is that of the Philos. Trans. memoir ‘On the Mathematical Theory of Errors of 
Judgment with Special Reference to the Personal Equation’ (1902), a paper which still seems to me 
one of considerable interest and importance. The paper was based on two series of experiments, the 
first carried out in 1896 by Karl Pearson, Alice Lee and myself on bisecting segments of straight lines 
by estimation, the second, in which Karl Pearson. Alice Lee and W. R. Macdonell were the observers, 
on estimating the position of a bright line on a strip. The first series seems to me particularly interesting 





238 Reviews 


as showing the number of useful warnings that can be drawn from experiments of the simplest type 
requiring no special apparatus. 

With the work of the ninth and tenth reprints ‘On the Theory of Contingency’ (1904) and ‘On the 
general Theory of Skew Correlation’—the memoir in which Pearson describes the correlation ratio and ANS 
its uses—I had nothing to do, and may content myself with the bare mention of them, though both are = pale 
of importance in the history of statistical method. Enough has been said to show the high value of the foe 
volume, which is a real ‘treasury’ of Pearson’s work. Hi 





G. UDNY YULE 7 





Fase ve 










































(All Rights reserved) 


BIOMETRIKA. Vol. XXXVI, Parts I and Il 
CONTENTS 





The infectiousness of measles. By Mason GREENWOOD. 3 S A : ‘ ‘ 
A note on the analysis of grouped probit data. By K.D.Tocuzr . & ‘ . ‘ . 
A generalization of Poisson’s binomial limit for use in ecology. By Maryorre THomas . ‘ 
The estimation and comparison of residual regressions where there are two or more related sets 
of observations. By A. H. Carrer z ° 5 : ‘ A ; Z 
Cumulants of multivariate multinomial distributions. By J OHN WISHART " : 
On the Wishart distribution in statistics. By A. C. ArrkENn : : * ‘ ‘ 5 
The spectral theory of discrete stochastic processes. By P. A. P. Moran . - . x a 
On a property of distributions admitting sufficient statistics. By V. 8. HuzurBazar é . 71-74 
On «& method of trend elimination. By M. H. QuENOUILLE s i ‘ : . 15-9 
On the estimation of dispersion by linear systematic statistics. By H. a ‘adver hae oe 92-100 
On the reconciliation of theories of probability. By M. G. Kenpatu. = ‘ 4 101-116 
‘che derivation and partition of x* in certain discrete distributions. By H. O. £ iaealin ‘ 117-128 
= note on the subdivision of x* into components. J. O. Inwin ‘ ° ° > i 130-134 
The first and second moments of some probability distributions arising from points on a lattice 
and their application. By P.V.Kmisunalyver . . - . -. - « « « 136-149 
Probability Tables for the range. By E. J. GumBen . é ° < 142-143 
Systems of frequency curves generated by methods of ictaslatien: By N. L. Je OHNSON . > 149-176 
Rank and product-moment correlation. By M.G. Kmnpatt . e . : : . é 177-19; 















Tests of significance in harmonic analysis. By H. O. Hanriry . ° . : Ps 194-201 © 
The non-central x*- and F-distributions and-their applications. By P. B. Pakane ‘ ‘ . 202-232 
MISCELLANEA 


Gn a method of estimating frequencies. By D.J.Fmmmmy. . . . . . 
A further note on the mean deviation from the median. By K.R. Nam . ° . 
REVIEWS 


Theory of Probability . ° ° ° ‘ . ° ° 
Karl Pearson’s Early Statistical Pesees 





A volume of Biometrika containing about 400 pages, with plates and tables, will be published annually in two half.” 
yearly issues. 

Papers for publication should either be sent to 

PROFESSOR E. 8. PEARSON, Department of Statistics, University College, London, W.C. 1, 
or if more « onvenient to 
De Joux Wisner, Statistical Laboratory, St Andrew’s Hill, Cambridge. 
M. G. Kenpatn, Esq., Cudham Court, Cudham, near Knockholt, Kent. 

It is a condition of publication in Biometrika that the paper shall not already have been issued elsewhere, and will not be wi 
reprinted without leave of the Editors. »\| 

Contributors receive 25 copies of their papers free. Joint authors 15 copies each. 

The subscription price, payable in advance, is Inland 45s. net per volume and Abroad 54s. net (including packing and — 
postage). Owing to the scarcity of early volumes, the following rates must now be charged for complete sets. Vols. I—X XXIV, 
including XX*: £133 in wrappers, not including postage. At present certain volumes are out of print, but steps are being — 
taken to re-issue these as quickly as printing facilities permit. Recent volumes may still be obtained at the wrapper price; 
this is 64s. inland, including postage. Index to Vols. I to V, 2s. net. Index to Vols. I to XV, 5s. net. Cage eae 
made payable to Biometrika, crossed “a/c Biometrika Trust” and sent to The Secretary, Biometrika Office, Department of 
Statistics, University College, London, W.C. 1, Ae Seen Se Sere Oe Svea nee apes on OR petals aes be eee 
All foreign cheques must be drawn in sterling and on a Bank having a London Agency. 























First Printed in Great Britain at the University Press, Cambridge 
Reprinted by offset-litho by Perey Lund Humphries & Oo., Lid., Bradforc 











ied 


| = 








Beene eed ia 
; © 
a SeanSre oe : 
sia + 1 
7 
« 
: d 
im 
A 
@ , = 
é = 
: Bu 
. ; Ape 
& i n 
® 
: a 
~ 
; P 
7 gee 
;: 
meneee 





as 


= 




















a 











